CMS performance – where did the Iron go?

December 12, 2006

It is difficult to come across a CMS implementation where the business owner donot complain about the hardware. It could be either the speed is too slow, or it could be that my CMS systems require too much hardware. Seasonal nature of application usage doesnot make it any easier.

Fortunately, performance tuning is rapidly becoming more of a predictable science from the witchcraft and art it was percieved to be earlier.

Here are the top reasons why a typical CMS Based site could be slow:

1. Complex, dashboard style content delivery pages:

If the application has complex dashboard style pages, showing different content items from different “sections” of the content repository, chances are that it will require too many queries and will be too heavy on the database. Typically these interfaces will not have specifically selected content items, they might have top 3 / latest 3 from different sections.

If you have such a scenario – you  have to look at two things

a) Cache. Consider caching different sections of the page or the page itself.

b) Denormalization: An ideal content repository is Normalized. That means that the same information appears only once.  So if we have tagged content items under specific categories, the tagging information is residing with the content.  This requires that for surfing each tag – we have to go thru the entire content repository ( or an index spanning the entire content repository). Normalization also means that we break the information related to an asset into different relational tables based on the logical entities. For instance if we have an image and an article content type, and each content type has some common attributes like create and publish info, then we will have three “tables”. The BaseContent, the ArticleCotent and the ImageContent. So for getting information to show on the page, i am not just scanning one index, I am scanning multiple tables, comparing different parameters and finally joining them.

This can get inefficient and slow. This is overcome by Denormalizing. And there are two ways here. One way is to duplicate the data. For example, when every new content item gets created or modified, we look at it and put their references against the seperate database tables having those tags. Even here we may choose to have two tables – All content with Tags “Perofmance” and all Published content with tag performance. The moment we do that, the processing need to find the relevant articles reduces.

The second way would be to aggregate all data which is required for article selection and for the summary display, and keep them in the same table- either the existing BaseContent or ArticleContent table OR a third cache table.

Denormalization increases the application complexity as the CRUD operations now need to update at multiple places.

2) Live data from interfaced applications.

Sometimes, in the content delivery applications, we show live data from external sources, with data fetched in real time. Whenever this happens, we create a dependancy on the response time of external applications.

The most common methods used to overcome this is to either cache the runtime data, or do a nightly import of the third party data in a local database.

Apart from that, the live data sometimes require lots of CPU cycles in converting the datastream to usable objects and ultimately the prentable HTMLs.  One should look at opportunities to either reduce the data from the source to what is required, and choosing formats which provide maximum efficiency in conversion.

3) XSLT processing and XML

Since XML became mainstream some six years back, many architects give in to the elegance of it. They pass the data as XML and the presentation layer uses XSLT to convert it to the desired HTML Formats. Unfortunately, XSL transformation is a very expensive process and very difficult to performance tune. XML navigation and deserialization itself is quite inefficient. If you look at it with a microscope, if you are using XPATH,  the XPATH string has to be compiled in runtime to come up with the code for navigating the XML itself.

So dont give in to XML for internal processing till you see real benefits.

4) Retrieving the content itself

Your choice of content repository ( Database, File system or Mix) and the means of accessing the same have a huge implication on the efficiency and speed of rendering the same.  Be careful while chosing non-native connectors to the repository. If you are yourself writing connectors, tune it to eternity. Static publishing in the most efficient format (like copying images to /images folder as files, like putting story based content in the format it will be consumed – i.e. HTML Snippets or as database rows.)

5) Single Repository for content production and delivery

Some installations use the same installation for content production as well as content delivery. Content Delivery part typically will have seasonalities in use. This results in content production getting slowed down when content delivery is near its peak loads. some way of seggregation, or throtteling of peak load helps.

6) Heavy background jobs

Most CMS systems need to do a heavy batch processing – be it creation of thumbnails, indexing of content, processing reminders and alerts, etc.

7) Auto refreshes and alerts.

In some really dynamic CMS or portals, we have some part of the delivery page which needs to be continuously updated. Be it  content like – breaking news. be it alerts like – new item added to your tasklist,  or maybe its some information from third party systems like current stock prices or weather information.

Traditionally applications refresh the entire page for that. We should consider refreshing only part pages using Ajax and Ajax like technologies.

8) Live data set is too large.

Any CMS system will typically have a mix of long shelf life content and short shelf life content. This leads to a tendency to leave too much content in the Live Data Store. Alan keeps on stressing on the importance of data purging, and I couldnt agree more. 

As a thumb rule, a 10 fold increase in data size makes your system twice as slow. But this is only query processing. It makes selecting the useful content hard. It increases the chances of finding out-dated documents and accidently linking to them more probable.

In short – give great stress to data expiry and seggregate expired from live data. It is usually easier to implement if you give an option to import from the archive repository to the live repository.

In a typical implementation, we will default a content expiry date unless and until the user changes the same.

The expiry algorithm also needs to check on content use, expire unused content and give a warning for used content which is supposed to have been expired. This is indeed complex, but worth the money.

Advertisements

13 Responses to “CMS performance – where did the Iron go?”

  1. Gregory Close Says:

    I help manage a CMS implementation for a large bank, and would like to speak with you.

  2. Pranshu Jain Says:

    Sure – would love to talk.
    My contact details are
    mail: pranshuj AT mindtree.com
    Pranshujain AT gmail.com
    phone: +91 988 662 5547

  3. rajbala Says:

    We have a lot of RDF data. What’s the best way to convert it to RSS XML if XSLT is too expensive?

  4. Pranshu Jain Says:

    Hello,

    As long as you are ensuring that the XSLT is applied only once and the generated RSS is cached, there is no issue with doing that.
    If you are doing that every time someone is downloading the RSS feed – that might be a killer and you must consider caching.I say Might because its difficult to tune XSLT hence it is usually non-optimal. Maybe you have implemented it in an efficient manner – and its no different than using some other form of processing.

    Pranshu


  5. Specifically regarding XML on Java, I’ve found that a StAX compliant processor usually provides the best performance.

    The best StAX implementation I’ve worked with is woodstox (http://woodstox.codehaus.org/).

    In addition, I highly recommend integrating it with NUX (http://dsd.lbl.gov/nux/), whose APIs simplified development greatly for me.

  6. Amar Says:

    XSLT processing on portal can be sluggish but if we use all the presentation layer activities in the CMS and use portal only for delivery mechanism then it can be much more efficient for eg creation of the html snippet from xml in the cms, deploy the html snippet on the portal and portal will only deliver the snippet.


  7. Nice Blog, I’m using toko for content management (it’s a free one)… http://toko-contenteditor.pageil.net

  8. Adnan Says:

    Hi Parnshu,

    it is really helpful blog.I am using Ektron CMS. Currently i am testing how can i improve performance of a web page built using Ektron CMS. I used simple Ektron controls on the page using drag & drop method (smartmenu, contentblock). Now i want to serve these controls using API so the controls are not rendered every time and response time of the page should improve. Could you please suggest me how can i achieve it (cache etc) or how can i use .NET Caching Mechanism todo that.

    Thanks in advance.
    Adnan


  9. Very useful information. Thanks for this. You got a great blog .I will be interested in more similar topics.I’m very interested in CMS and all its related subjects.

  10. John conrad Says:

    one more nice topic in your blog and nice comments too keep it up, If you advise some more related links to topic. I’m very interested in CMS and all its related subjects.

  11. Josef Telmer Says:

    It’s a very interesting subject I was looking around about more information but you got really what i was looking for in your article so thanks and keep it up you have a great blog .
    I’m very interested in CMS and all its related subjects.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: