Open Source alternative for Guided Navigation?

August 22, 2007

Almost everyone wants guided navigation ( or multi-faceted navigation) now. Commercial entities like Endeca rule the space, but tend to get really expensive really fast.  So I looked at Apache Solr ( based on Lucene) for such need for a customer.

Solr is pretty impressive. You can do multi-faceted search and navigation based on pre-defined tags. The navigation could be based on string matches, one of multiple-value matches, date range match etc. – and you could optionally do keyword search as well.

This fits the bill where you have structured metadata ( like a product catalogue / product reviews like CNET review etc).

So what about “guided navigation” for content which has not been as effectively meta-tagged.

Now this is where it becomes challenging in open source stream. There are a few projects like Classifier4j – which uses byesian filter which can be trained to auto-classify content. There are projects like carrot2 which do search result clustering. Carrot2 is pretty effective in choosing the phrases to cluster against. About 80% of  categories it determines are very meaningful. It appeared a bit slow in the tests I ran. I am not very sure of its performance for large resultsets – or what % of meaning categories we miss out on.

The auto-classifiers need a lot more work. They are not simple plug and play – I dont know an effective open source alternative for this yet. So I am spending some time – looking at it from ground up using  a set of existing libraries which can provide base for text classification. I will update if I make a headway – Or will appreciate inputs from someone who has got it working.

Search result prioritization can be done on defined metadata easily – but I have not tried “learning” software here. Similarly  am yet to try “suggestions” and spell checks.

In short – Using lucene/solr for multi-facteted search is a very viable alternative to complex database queries and expensive commercial engines to implement the same.  But you are not getting a 1:1 equivalent of Endeca or Autonomy.

15 Responses to “Open Source alternative for Guided Navigation?”

  1. Panagiotis Konstantinidis Says:

    Hi,
    we are also looking for a low-cost/free multifaceted search/navigation. As you correctly pointed out, Endeca is the leader. I know that Verity (now Autonomy) K2 also had a “parametric search” functionality (exactly the same with multifaceted search.
    Our need is for e-commerce / product catalogue applications, where one has to quickly select some product parameters (e.g. color, size, etc.) and also do free-text search in the product brochure.
    Can Apache solr do that? Any other alternatives?
    Thanks


  2. Hi,

    Thanks for the warm words about Carrot2! Just to extend the information you provided in your post: as a complement to Carrot2’s Open Source clustering algorithms, we also offer (on a commercial basis) a highly-tuned document clustering engine called Lingo3G:

    http://company.carrot-search.com/lingo-3g-vs-classic.html

    Best wishes,

    Staszek

  3. Pranshu Jain Says:

    Hi Panagiotis

    Solr – should fit your bill.
    A list of sites which use it is available at
    http://wiki.apache.org/solr/PublicServers
    I am currently involved in an implementation where we are using it for eCommerce purpose for a catalog consisting of over 70,000 items – and it works.
    It took us about 40 Person Days of effort ( including learning) to get it working the way we wanted it to.

    Mondosearch is a reasonably priced commercial search engine which does it as well.

    http://www.mondosearch.com/About%20MondoSearch/Categorized%20Results.aspx

  4. thedjinn Says:

    We use solr for indexing about 6.5 million entities for our site. We are also in process of deploying carrot2. You have pointed at the performance issues with carrot2. What kind of numbers are we looking at?

  5. Pranshu Jain Says:

    Hi,
    I havent done any performance tests with Carrot2. I had deployed it in conjunction with Lucene on my desktop – for a 30 MB repository. While Lucene returned the results in sub-seconds, when I went on to do “clustered search” with carrot2, it took about 5-6 seconds (for a query which returned me about 400 documents). Now I have not done any analysis on the cause for delay. Carrot2 accesses all search results – all 400 of them to cluster them, while I was displaying only the first 20 in case of “flat” search. So its possible that the time was taken by inefficiency of accessing the search results and not by clustering itself. It made me wonder – how will it perform if I get 2000 or 20,000 results back? At this moment, its not fair for me to attribute performance issues to carrot2. It could be the way I used it. If I dig deeper into it, I will post the findings here.

  6. Katherine Says:

    Pranshu, would you mind sharing the ecommerce site that you and your team worked on? 40-person days seems quite low for developing the solution with Lucene/Solr. I met Endeca at a tradeshow and one of their value props was time-to-market because of the architecture and toolset. It would be interesting to see your results. Thanks!

  7. thedjinn Says:

    Pranshu,
    I am currently involved in a effort to re-write the search system with about 6.5 million search-able widgets. I was looking from that point of view, if clustering would be possible using carrot2. But I guess not. The time frame threshold is set to less than 4 secs.

    Did you try solr with carrot2

  8. Pranshu Jain Says:

    Hello Katherine,
    The site is currently under a password protected Beta with a go live around oct 10th. I will post a link at that time. I will mail a couple of screenshots.

    Prashu


  9. Hi,

    Our company has deployed both Autonomy and Endeca within enterprise environments. One solution that may be considered for Guided Navigation, is a new search engine solution available at http://www.sqlone.com. They provide an intuitive GUI to develop categories for guided navigation.

    CJ

  10. Jeesmon Jacob Says:

    Anyone looked at FAST ImPulse (http://www.fastsearch.com/l3a.aspx?m=1008)? Not sure any demo download available. It’s been used at bestbuy.com, windowsmarketplace.com, etc.

  11. Dave Thoma Says:

    I have worked as a project manager on several Endeca & 1 FAST implementation. While Endeca is more costly that FAST, it seems to be technically more advanced. I haven’t worked with Endeca in 3 years, but have heard that their documentation and technical documentation have improved dramatically over what was available in 2006 when I last used it.

  12. Christine Diploma Says:

    The Endeca platform is also offered in a hosted offering as well. We came from Endeca and felt the need to push the platform downstream. It’s a lot more attainable by mid-market companies. I provided some links.

    Endeca On-Demand:
    http://thanxmedia.com/solutions_endeca_on_demand.php

    As always, time-to-market is a big focus.

    Endeca Express:
    http://thanxmedia.com/services-endeca-express.php

    Thanks!
    Christine


  13. I read your posts for quite a long time and should tell that your articles always prove to be of a high value and quality for readers.

  14. janapati Says:

    Hi,
    I am newbie for solr.Now,I am trying to implement the solr for product catalog search.I identified the configuration files which we need to modify for solr search.For database, i have written a conf file in which i have given the database related details.In schema.xml, i configured all the fields which we are going to index and stored.I am facing difficulty to understand solrconfig.xml configuration.

    Is there any resources which are going to explain about solr configuration?
    How can we do indexing programatically using solrj?
    The indexing we can do using admin interface,but if we want to do indexing on scheduled basis ,we need programatic appraoch?
    I configured facets on indexed fields.I am getting the below results
    brand:[HP (5), Apple (4), DELL (4), Acer (3), Lenova (3), Hcl (2)] as facets.
    In this facets count is not comming properly.
    How can we get the search results when we click on the perticular facet?Do we need to send one more request to get the search results when we click on the facets?

    Please provide suggestions for the above questions.

    Thanks in advance.

    Regards,
    Siva


  15. It’s an awesome post in favor of all the internet users; they will obtain advantage from it I am sure.


Leave a comment