Yahoo/IBM Omnifind where?

January 1, 2007

A co-architect at Mindtree – Regunath Balasubramanian had a good look at IBM/Yahoo Omnifind. Here is what he had to say about it.

———

Omnifind offers three types of relevance – level from entry link, how recent the document is and page rank – in terms of links pointing to the page. These give a good default ranking when you are indexing a site by crawling.

Omnifind could crawl sites via authenticated proxy, but couldnot crawl secure sites – atleast not by default. It could crawl file system. It was also not apparent how authenticated sites will work as well.

There is a limit of 400,000 docs beyond which you need to upgrade to commercial version.

It provides interfaces – very comprehensive to search, and slightly cumbersome for insert and delete from index.

There was a limit to max results – 1024 – which should not be a big problem in most cases.

The UI is nice and easy to use – and you can get started easily.

It supports the popular file formats out of the box.

Also it was not apparent if you could control the section of the page to be indexed – for instance – can you tell it to not index the keywords in navigation.

Now the question is where you will use this engine?

On looking under the hood – it uses Lucene – which is a popular Java search engine. That is the choice of many for Java developers so unlikely to be question.

It also has an Apache Derby – which provides a file based database. Now derby is not known to work in a clustered environment.  Can you cluster it? Atleast for failover?

What exactly is it using Derby for is not clear. It can either use it to store a cache of the web pages, thus reducing load on Lucene, or possibly its used to store only the configurations.

It seems to work well for crawling and searching. Its too heavy for desktop search, so that doesn’t seem to be what it aims.

In case you want to use it for searching database content, Any value over vanilla Lucene is not apparent.

================= 

So what market is it actually targeting? If its searching only non-authenticated HTTP sites? is it targeting searches on sites publishing documentation or something avaialble to all? That’s a very limited market to enter into! Almost every intranet site and most internet sites will also have dynamic sections where ACL based search will be desired, forcing the architects to go for two solutions instead of one ( if considering omnifind).

It will definitely raise the bar for commercial search engines in this market and put onus on them to prove why they should be worth the extra money – or bring in low cost entry level licensing – the same thing that happened to the database market where Oracle and IBM offer single user databases upto 2 GB for free.

Advertisements

3 Responses to “Yahoo/IBM Omnifind where?”

  1. apoorv Says:

    Good review Pranshu.
    As for the target market, there’s a vast majority of public we sites which require nothing but non authenticated search. I don’t think that’s a limited market at all!

  2. Luis Alves Says:

    Hi Regunath,

    Comments:
    – OYE supports authenticated HTTP sites out of the box, just click on the icon on the manages website panel and add the user name and password, after you added the web site.
    – The supported index limit is 500000k docs.
    – “There was a limit to max results – 1024”, the api return 1024 per page, but you can request the second page and iterate thru the result set by getting more pages.

    Regards,

    Luis Alves


  3. […] The new IBM/Yahoo Lucene (oops…"OmniFind") search offering is probably not a Google-killer technically, but seems to target the same website search scenarios… […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: