Search, Sitecore

Avoiding downtime while rebuilding your Lucene search indexes in Sitecore ASP.NET CMS

This is the first blog post I’ve written since my arrival in Sydney and my new job within Sitecore. It will be about a simple yet strong feature often overlooked in the Lucene indexing capabilities of Sitecore 7: how to avoid downtime on your Lucene indexes.
One of the problems I run into occasionally is that a rebuild of an index, e.g. the web database index, will cause features of the site to not work properly. This is due to the fact that the index is taken offline while rebuilding.

But what causes a rebuild?

A rebuild may be triggered manually through the control panel in Sitecore, or by an indexing strategy such as RemoteRebuild (see more here). But full rebuilds might also be triggered by simply publishing too many items from master to web. This is due to the Threshold setting in e.g. the PublishEndStrategy, which will transform many single item indexes to simply one full rebuild if the given threshold is reached (See more here).

How do I avoid downtime then?

Well the solution to the above problem is relatively simple, yet often overlooked: Changing the standard Lucene index handler (the class which actually handles reading and writing to the Lucene index files) from LuceneIndex to SwitchOnRebuildLuceneIndex will remove this problem.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
        <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <indexes hint="list:AddIndex">
            <index id="sitecore_web_index" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">
              <param desc="name">$(id)</param>
              <param desc="folder">$(id)</param>
<!-- ... -->

The SwitchOnRebuildLuceneIndex maintains two separate indexes on disk, one active and one passive. As a rebuild occurs, it is executed on the passive index and when the rebuild completes, the passive becomes active and the active becomes passive. Thereby, there will be no downtime of the index.

Things to keep in mind

Therefore, a couple of things to keep in mind:

  • Switch to SwitchOnRebuildLuceneIndex if you want to make sure that you have no downtime on your Sitecore search indexes.
  • Make room on disk for maintaining duplicate disks.
  • Always choose your Index update strategies carefully.
  • Different server roles might require different index update strategies.
Standard
Sitecore

Sitecores Lucene integration

I have always been fond of Sitecores use of Lucene to index. I often use Lucene to relational operations avoiding iterating over the complete content tree etc.

The way Sitecore uses Lucene to index, hasn’t been all that well documented, but earlier this year Sitecore released some documentation describing how items are indexed. The guide is ok and it describes how indexes are configured and how the indexing is performed in an understandable way.

However I often find that I need some documentation on how to extract data from the indexes. Even Lucene’s own online documentation on this, is somewhat limited. A couple of weeks ago I found and skimmed the book “Lucene in action”. It is a recommendable book and describes the different queries and analyzers really well.

Still there is a lack of Sitecore documentation on how to extract data through the Sitecore wrapper API to Lucene. In particular I could use some documentation on the Sitecore.Data.Indexing namespace, which seems to hold a lot of functionality – also when one wants to extract data. The documentation is limited to a few snippets and blog entries.

 

However the lack of documentation hasn’t been an issue until Sitecore 6, as it was possible to use the tool Luke. With Luke it was possible to browse indexes, try out query strings etc. When developing something that uses Lucene, this tool has been essential.

In Sitecore 6 the indexing uses compression, resulting in Luke failing whenever you browse or search through an index – as .net compression isn’t compatible with Java compression. Whether or not to use compression for the indexes, hasn’t been implemented as a setting in the web.config, but is hardcoded into the IndexData class, so it is not possible to disable. L

I have asked Sitecore Support to register a change request for implementing a setting – but until then developing functionality, which uses the indexes, are based on guessing and use of the reflector… So Sitecore please hurry!

Standard