Regu– whose IBM Yahoo Omnifind search review I posted earlier has a very interesting byte about Conventions over Configuration

He believes that with increasing commoditization of IT – cost, productivity etc are very important – and he goes on to suggest that we could use Conventions instead of Configurations (like Ruby on Rails does) towards this end. Its a great thought I believe.

However I donot believe IT is a commodity as of today – I wont explain that as Sadagopan has done a good job in articulating the same. He in fact actively opposes Nichola’s carr’s view that IT doesn’t matter. I am not sure that there are many analysts who dispute IT spending having their value Especially with IT being 50% of capex these days. Read the story here http://123suds.blogspot.com/2007/01/it-does-matter.html

However I strongly believe, that most things start tailor made and later bifurcate into commodity and designer ware. That is very likely to happen with IT as well. So commoditization is inevitable.

I am also a great believer in less code contributing to maintainability ( Rather than flexibility). So Personally I am not a big fan of excessive configurability ( as invariably it leads to lot more code – till you use rules engine) . Have a look at this interesting post here from Donald Ferguson – ex IBM, new Microsoft employee.
http://www-03.ibm.com/developerworks/blogs/page/donferguson?entry=less_code

Similarly I am not a big fan of code generators – as once you customize the generated code, the code generators cannot help you. But I think the meta programming guys have cracked it. RoR is showing the way for meta programming. Java has a poor cousin with JSR 52 (Standard Tag library) – which is just a start, and that to behind its time. One wants a lot more. Similarly – I have seen in atleast two occasions on large projects in our company requring swing based forms – architects going in for meta-programming of those using XML based language they defined.

Using convention does make sense. Swedish, Arabic, Sanskrit and to some extent most languages allow you to join words and pre-fix/suffix part words to make new words which mean as much as sentenses. If we can learn that, conventions should come natural to us. Its a good thought and Sun, IBM, Oracle or whomsoevers job it is to drive Java these days – Please take notice.

Jakob Nielsens published the study on 10 Best Intranets and exposes some interesting facts – which we all knew were existing, but never thought that best intranets were made on them.

One of the points he makes – which I am going to talk about today

 ” This year, all the winning intranets were template-driven and relied on a content management system (CMS). Strikingly, most intranets used their own homemade CMS. Thus, even though there are standards within each intranet, there’s no standard across intranets, even in the choice of CMS. “

Almost everyone I talk to about intranets – can cite more than one examples when prototypes and demos became live intranets and flourished from there. Not surprisingly, many of the working prototypes tend to be based on open source products. However, this survey doesnt mention any of the open source porducts and that I do find surprising. Possibly because the choice of products were so varied among these 10 companies that there was no product used ad multiple places.

However more and more companies are looking at standardizing their intranet platforms. Amongst the projects that me and my company have been involved in, the platforms were implemented using varied technology sets – like Documentum and ATG Portal, Interwoven Teamsite+Mediabin and BEA portal, Vignette Suite, Sharepoint + Lamp based open source, complete custom development, etc.

The point here is that the list of features required for an intranet are so varied, that no matter what product set you go with, you will need to plug in applications, third party components and do a lot of custom building. Usually the product choices are driven by the requirements of first set of sites to be created – and that is not a bad strategy as it helps time to market for the first set, thus keeping the interest of the organization high.

 So which product you go with doesnot matter (you have to significantly customize all of them)- what matters is that the product should be extensible, provide you option to plug in or override functionalilties like Authentication, search etc, should make it easy to integrate with custom developed applications and that it should have a friendly licensing option for your needs. Most critically, it should expose a lot of interfaces to allow third party applications to connect to it and – in certain cases – drive it.

I think the only vendor which would come close to providing a ready to use Intranet would be SAP – some day soon.

Yahoo/IBM Omnifind where?

January 1, 2007

A co-architect at Mindtree – Regunath Balasubramanian had a good look at IBM/Yahoo Omnifind. Here is what he had to say about it.

———

Omnifind offers three types of relevance – level from entry link, how recent the document is and page rank – in terms of links pointing to the page. These give a good default ranking when you are indexing a site by crawling.

Omnifind could crawl sites via authenticated proxy, but couldnot crawl secure sites – atleast not by default. It could crawl file system. It was also not apparent how authenticated sites will work as well.

There is a limit of 400,000 docs beyond which you need to upgrade to commercial version.

It provides interfaces – very comprehensive to search, and slightly cumbersome for insert and delete from index.

There was a limit to max results – 1024 – which should not be a big problem in most cases.

The UI is nice and easy to use – and you can get started easily.

It supports the popular file formats out of the box.

Also it was not apparent if you could control the section of the page to be indexed – for instance – can you tell it to not index the keywords in navigation.

Now the question is where you will use this engine?

On looking under the hood – it uses Lucene – which is a popular Java search engine. That is the choice of many for Java developers so unlikely to be question.

It also has an Apache Derby – which provides a file based database. Now derby is not known to work in a clustered environment.  Can you cluster it? Atleast for failover?

What exactly is it using Derby for is not clear. It can either use it to store a cache of the web pages, thus reducing load on Lucene, or possibly its used to store only the configurations.

It seems to work well for crawling and searching. Its too heavy for desktop search, so that doesn’t seem to be what it aims.

In case you want to use it for searching database content, Any value over vanilla Lucene is not apparent.

================= 

So what market is it actually targeting? If its searching only non-authenticated HTTP sites? is it targeting searches on sites publishing documentation or something avaialble to all? That’s a very limited market to enter into! Almost every intranet site and most internet sites will also have dynamic sections where ACL based search will be desired, forcing the architects to go for two solutions instead of one ( if considering omnifind).

It will definitely raise the bar for commercial search engines in this market and put onus on them to prove why they should be worth the extra money – or bring in low cost entry level licensing – the same thing that happened to the database market where Oracle and IBM offer single user databases upto 2 GB for free.