I have always been fond of Sitecores use of Lucene to index. I often use Lucene to relational operations avoiding iterating over the complete content tree etc.
The way Sitecore uses Lucene to index, hasn’t been all that well documented, but earlier this year Sitecore released some documentation describing how items are indexed. The guide is ok and it describes how indexes are configured and how the indexing is performed in an understandable way.
However I often find that I need some documentation on how to extract data from the indexes. Even Lucene’s own online documentation on this, is somewhat limited. A couple of weeks ago I found and skimmed the book “Lucene in action”. It is a recommendable book and describes the different queries and analyzers really well.
Still there is a lack of Sitecore documentation on how to extract data through the Sitecore wrapper API to Lucene. In particular I could use some documentation on the Sitecore.Data.Indexing namespace, which seems to hold a lot of functionality – also when one wants to extract data. The documentation is limited to a few snippets and blog entries.
However the lack of documentation hasn’t been an issue until Sitecore 6, as it was possible to use the tool Luke. With Luke it was possible to browse indexes, try out query strings etc. When developing something that uses Lucene, this tool has been essential.
In Sitecore 6 the indexing uses compression, resulting in Luke failing whenever you browse or search through an index – as .net compression isn’t compatible with Java compression. Whether or not to use compression for the indexes, hasn’t been implemented as a setting in the web.config, but is hardcoded into the IndexData class, so it is not possible to disable. L
I have asked Sitecore Support to register a change request for implementing a setting – but until then developing functionality, which uses the indexes, are based on guessing and use of the reflector… So Sitecore please hurry!
Hi, i am working on Sitecore – Lucene Indexing things, i am working on defining Individual Index for the each defined websites in web.config file. However i dont see any way to do so, Could you please guide me on this ASAP?
Thank you…
Hi Priyanka,
As far as I know there is no default out-of-the-box way to do this, but you might ask on the SDN forum.
However it is not that hard to implement yourself. There are two ways to solve this issue:
1. You could keep all documents in one index and then just save the path in a field. In your search you could then just use a PrefixQuery to ensure, that the indexitem holds a path, that starts with the root of your site.
2. You could create an index per site and then override the indexing class excluding all documents not relevant for the given site.
Either way you should probably override the Sitecore.Data.Indexing.Index class. You can take a look the class using the Reflector.
If you don’t have to much data, I would recommend you to go with option 1. This is the simplest and you won’t clutter up your configuration if you have many sites.
Hope that helps you.
How exactly would you save the path in a field? Right now I am trying to limit the index to only items under /sitecore/content/ but the only way I find to do it is to filter the hit results in my for loop.
How can I filter the index exactly? I don’t know how to add the path field as an indexed field in the web.config.
Thanks.
Hi John,
I also replied you on the SDN, but here is the link to the 3 part articleseries on LearnSitecore again: http://learnsitecore.cmsuniverse.net//Developers/Articles/2009/06/LuceneQuery3.aspx
Hi Jens,
First of all thnaks a lot for your reply.
I would like to clear that we are going to have around 400 sites in single installation of Sitecore so we can not use the first option.
We dont have seprate templates for each websites so i dont think we can just do this through web.config file. I believe we need to use Custom Indexing for this, i have gone through documentation for it but i dont understand what and where exactly i need to start work.
For example, there is function like “protected override void AddFields(Item item, Document document)” in given codefile but i want to know do i have loop through all the Items and only add those Items to index which are related to perticular site? That looks very odd way or i dont understand how can remove items here if its not related to site?
Could you please provide me referance to some code example, where Custom indexing it integrated with code(i got the codefiles from Lucene-sitecore online). I am not sure whether its the place to discuss issues related to sitecore or not, if its not then please provide me emailId or so where i can send my query for search things.
Thanks a lot!!!
Hi Priyanka,
I have dropped an answer here http://sdn5.sitecore.net/forum/ShowPost.aspx?PostID=12791
Cheers
Jens
Pingback: Introducing the Lucene Index Viewer « Molten Core
Pingback: Sitecore Fetch Squad » Blog Archive » Introducing the Lucene Index Viewer
Hi Jens,
There’s a “Lucene.Net.CompressionLib.class” key in web.config, defining the compression mechanism used.
You can try to implement your own dummy compression (actually doing nothing), and still use Luke to browse the indexes in Sitecore 6.
Not 100% sure if this will work, but could be worth trying.
Regards, Alexey
Hi Alexey,
Thanks for your comment. I just tried it out, but it didn’t seem to work.
I think that the record gets registered as “compressed”. When Luke opens the index it tries to Uncompress with the Java uncompresser, which isn’t compatible with .net’s way.
If I implement my own compresser not altering the text, it still tries to uncompress it.
Thanks Alexey. That got me going down the right path and I found a solution. http://seankearney.com/archive/2010/08/13/sitecore-6-lucene-luke.aspx