Me and my colleague Brian Pedersen have been spending time in London (thx to Google London and Dave Gallerizzo of Fig Leaf) this week looking into the Google Search Appliance (GSA) – which is a really cool search product which I hope we will implement in a lot of future solutions. By the way, Pentia is now a Google Enterprise Partner 🙂
The GSA though, is a double edged knife – on one hand, it provides really good value out of the box (best of class in search relevance), and has a lot of nice features and configuration possibilities. On the other hand, the configuration interface on the GSA sucks! yes, it sucks! I would never expect any end-users to understand what happens in the UI, and in worst case, they will mess up the nice out-of-the-box functionality. Said in other words: the summer intern who wrote the config UI should be definately find work in a non-computer related sector.
This said, the GSA has some really cool API’s, which will allow you to provide nice functionality for your users:
If you consider integrating a GSA in a Sitecore (or any CMS) solution, you should consider integrating some of the configuration directly in the CMS backend. Imagine users being able to tweak the biasing of content, or adding content to the “recommended” box in the top of the results directly in Sitecore. This is all available through the Admin API – which is nicely wrapped in .NET assemblies.
You can find the Admin API here.
The Feeds API allows you to insert content directly into the crawler queue. There are three options: web feeds, which simply adds URLs you want the GSA to crawl, url-and-metadata feeds, which allows you to add additional metadata to URLs being crawled (think adding metadata to PDF files), and content feeds which actually insert the html or binary data directly into the GSA.
You can find the feeds API here.
Policy ACL API
One of the strong sides of the GSA is the possibility to crawl and serve secure content, e.g. on your extranet or intranet. The problem though, is how to determine which URLs the individual users in your custom security solution has access to. This is where the Policy ACL (access control list) API comes handy. In short, it allows you to insert access lists in the GSA and map them to URLs – which again allows you to have fast secure queries – simple and wrapped in .NET.
Find the Policy ACL API here.
There are tons of other APIs and documentation on the Google Search Appliance on code.google.com.