Cameron Purdy from Tangosl.com commented

Your examples are definitely “traditional” caching. We do see that a lot (e.g. hotel chains, as you said). There is a more contemporary approach to caching, which is to actually keep the data up-to-date in memory (real-time “single system image”), as opposed to having LRU or some other way to get rid of stale data. For a good example, see: http://wiki.tangosol.com/display/COH32UG/Cluster+your+objects+and+your+data Peace.

It is a very relevant comment. Cameron says that instead of using a database for persistant synchronized cache, we could use in-memory replicated cace like the product of his company Coherence.

This concept is not new – concept of travelling objects and external cache have neen there for some time. Though I havent tried Coherence myself, there is a possibility that Coherence has solved many of the stumbling blocks of the past.

Traditionally, we have considered and possibly used in-memory databases for some of our caching requirements. ( Remember TimesTen ? Now its owned by Oracle). These databases were either too expensive, or too flaky(using non standard queries, providing no guarantee of the data etc.).  Then we discovered that all databases also started giving ways to pin databases to memory – to get significantly faster performance – and that became our preferred choice.

App servers also started providing session sharing – though performance cost of the same forced us to stop putting data in session, and putting it in databases instead.

There were also programming frameworks like Jini ( and more recently Javaspaces as a part of Jini) which enabled networked objects – available to all nodes in the Jini cluster. Jini never got much adoption.

Neither have cluster replication products.

I believe the reason for the same is that when most of us design, we design for a single server and IF the usage becomes so high, warranting big clusters cluster and performant caches, we then re-engineer parts of the application or Hardware to enable clustering ( for example, organizations prefer to introduce a NAS in the datacenter than to change the software to use Databases instead of file systems when clustering is introduced).

With products like Coherent, there could be a case where by paying 5K licensing fee, we could save 1-2 CPUs on the database and hence justifying the cost. But, i will be vary of this technology for the following reason.

Today, the architecture of choice on App layer seems to be a large number of small, blade servers clustered by a hardware load balancer. In this case, for a muti-way replication with such large number of nodes, we dont know how such technologies will perform. For vertically scaled applications ( running on lets say 2-3 machine having 4-8 processors each), such products have a very high changes of working. The second issue which I have is the whether the updates are ACID or are the BASE ( Well – base is a concept which never picked up – standing for Basically Asynchronous, eventially Synchronised i think), so do they leave some amount of margin for stale data? If not, if used in “Transactional” mode, do they really give better performance?

The onus is on the cache vendors to convince us.

I am sure they will – and if the products are actually as good, they should become popular. They seem to have a good client list already.

However, I would be worried about the life of such products. When this demand becomes mainstream, commercial App server vendors will have to provide the same functionality to survive – whether they OEM Camerons product or build something on their own.

Let the brutal theory of Survival of those “most responsive to change” decide the same.

Thankfully, we need to face slow and unreliable networks lesser and lesser however with mobility of office workers increasing, the partly disconnected applications appear to be becoming much more mainstream.

The techniques I have seen being used for communication across slow networks are:

RSync: Rsync provides file system synchronization in a highly efficient manner. It detects the differences and sends only the Diff, that to gzipped across the network. I am not sure whether any other method optimizes the network usage so much.

I have also seen companies use Robocopy or some ftp clients which support restart for the same purposes as well, however they are a less optimized than RSync.

DB replication: Products like Sybase SQL Anywhere ( traditionally) or MS SQL Anywhere (more recently) provide two way replication with conflict resolution capabilities between client and servers. This enables having lightweight database at the client which eventually gets synchronized with the servers. The traditional databases also can replicate across internet and slow networks using custom replication scripts. In all cases, for a two way replication ALL tables must have primary key.

Message Queue: Most message queue offer guaranteed delivery between client to the server and also offer replicated queue options so that the messages are never lost.

In these posts here , here and here , software architects have been contemplating best ways to communicate between clients and servers – for notification, PUSH or programmatic interaction needs.

In this post, I will highlight the problems we face today and the options we have to overcome the same.

#1 Proxy, Firewall, NAT : Clients (even on extranets) could be separated from the server by proxies, firewalls and could have NATting. This means that (unless you are on the same private LAN- and I wont discuss about that) – the server cannot talk to the client, as it does not know the real IP address of the client (it knows the proxy’s address or that natted by the client network), the client does not have a legal IP address and that it cannot invoke connection to the client ( even if it happened to know the IP address – due to client side firewall – which could be a personal desktop firewall). Hence, a socket connection is completely out of question. The only options available are client pulling (and possibly polling for a simulated push) the information via HTTP and HTTPs protocols only. Now there is a caveat here – that I “believe” that HTTPS protocol has a loophole that lets you establish a socket with the server (HTTP Tunneling) and potentially, server could write on the socket and have a push. Since I do not know any details about it and because I  seem to find any published APIs ( and because I am too old to do socket programming), so I will skip this option. This leaves client initiated HTTP and HTTPs only and Push is a “polled” pull. However, this doesn’t limit your options a lot. You have Queues which work like that, you have SQL Server replication happening via this method, you have web services, you have .NET remoting happening via HTTP and recently you have REST and you could use other data interchange methods like EDI, RSS and the works.

Problem #2: Cross platform/ cross version client support: It is likely that you will have clients which are different operating systems and versions ( even if its the same technology), it is likely that you want to support clients using a different technology as well ( like a .NET server publishing services being consumed by JAVA or LAMP clients). Once you add the support for cross platform client support to your requirements, you will need to cut out custom protocols like .NET remoting. ( unless you publish an SDK in all supported languages for that). However it leaves web-services, REST, other forms of XML interchange, possibly Message Queues as they will have support/APIs in many different languages, possibly it leaves SQL Server Anywhere replication as well as all programming languages will be able to connect to it. Also custom message formats could be included only if parsers are available in all consuming languages.

Problem #3: Control on client environment. With the increasing popularity of Microsoft’s one click technologies, the distribution of thick clients has become a lot easier, however it is far from solved. For example – can we assume that clients will have admin privilages to install the application? Can we assume that the clients will have Windows CD available to potentially install MSMQ if needed? Can we assume that the client environment will support SQL Server? Do they have required ports free? Possibly the user will have admin privilages but no windows CD. This could cut out some projects from the list of options we have.

Problem #4: Complexity: The more the dependencies for the application to work, the higher is the complexity. Possibly, it is best oto use a simple protocol like REST ( requiring simple HTTP post requests and simple XML response with no SOAP envelopes etc.)

Problem #5: Data structure: Is the data structure simple to parse? Or is it too complex? Maybe its better to use SOAP envelope which will do the parsing for you and lets you deal with objects at both ends.

Problem #6: Performance , data size and number of calls: Do we know enough about client hardware to say that they can support MS SQL Server (or the dependency we impose)? Do the replication method we choose provide notification as fast as we need them to? Will the message size and number of interactions make some methods (like XML Parsing) too expensive? Is the throughput requirement very very high?

Problem #7: Message delivery guarantee: What is the tolerance for lost messages? Can the applications retry ?

Once you answer all these questions, you are in a better position to decide which communication method to use.

Personally, I have seen that web services fare best today ( except for some extreme performance requirements). Document style web services and/or REST fare slightly better on performance as well. You need to consider a queue only if the message delivery needs to be guaranteed. 

I had asked a question here about this topic. The answer I got really surprised me. I was told to consider Grid computing. Anyway, my thoughts on the topic are below.

Even though we are writing 3-tier applications, which are clustered and load balanced using hardware load balancers, we are often left wondering what to do about scheduled jobs and what to do about Long running jobs – which cannot be web pages.
The options that we face are:
1) Make them SQL Server Jobs. Typically databases are clustered, even if not, they must be available for the application to work. So it is better to tie the failure dependency to the database. The challenge that we face here is that it is really not advisable to have custom DLLs running on potentially shared database machine.
2) Make them EXEs and trigger by windows scheduler. The problem here is that windows scheduler is not cluster aware . If this needs to be done, we either need to live with manual failover of the jobs, or we need to schedule the job on multiple machines and implement some kind of locking possibly using database – to ensure that only one job runs at a time.
3) Look at a clustered scheduler – including windows cluster APIs in case you have an OS level clustering at the web/app server level. In my experience, it is rate to have a cluster at this level, but if there is one, you must be ready to exploit it. All OS clusters, including veritas, provide cluster services programming. You usually have two options : you can either make the job/service a part of the machine healthcheck. so if your job fails, the cluster fails over. Secondly, you can make the job run only on the primary node of the cluster. Possibly its the second option we are looking for. There are third party clustered schedulers available, mostly commercial.
4) Windows services: here again, we can take advantage of OS cluster services to make the service rum on primary node of the cluster only. Alternatively, we can code a lock at database level to make only one service active.
5) Grid computing APIs : Grid computing tools, acting as glorified schedulers, can ensure that the job runs once, successfully and only once.

Doing a response.redirect from a HTTPS page to HTTP page is not considered good as the users get this warning “you are being redirected to an unsecure site”.
The one way which I have used to avoid this warning is to tell a client side javascript to do the redirect.
i.e. if I want to go from https://myserver/a.aspx to http://myserver/b.aspx, I do the following
From a.aspx, i do a response.redirect to https://myserver/redirect.aspx?targeturl=http://myserver/b.aspx
Then by using a javascript on redirect.aspx, i make a page on load javascript method to call something like
<script>
Window.location(<%=targetUrl%>);
</script>
this makes the client browser move to the http page.
While coding such redirect – you may want to consider that the deployment of the application may be done on different environments with different server URLs. Hence you may want to read the hostname (“www.myserver.com” of https://www.myserver.com/) from what the client has specified.

Sizing for CMS

August 28, 2006

I will try to complement Apoorv’s blog at http://www.apoorv.info/ on Portals and Content Management.
In one of the recent blogs, Apoorv mentioned sizing. That is one thing which I happen to have worked a lot on.
I will list a few things which I noticed about the content management systems that I worked on :
– It is the database access that kills.
– SQL Queries on the presentation layer tend to be a lot heavier than the queries on the Content management backend
– Unless and until you have a very simple presentation – it will always make sense to cache the presentation as HTML pages and serve that to customers instead of dynamic pages (of course everyone knows that)
– Even if the update frequency or volume is large (lets say more than a page a minute on an average) and the database size gets large – even the publishing process takes its toll. It is good to have an aggressive archiving for the content. In case Archiving is not feasible (after all it is a content management system) – a replicated database for presentation may be the only option.
– Coming to sizing you are likely to get a better projections by benchmarking against existing applications
– For benchmarking against existing application – you need to have page views per second for the most frequently used dynamic pages and the database size.
Read the rest of this entry »

In my previous post on cache implementation I had talked about where to keep the cache. Now I will talk about how to access the cache and how to expire the same.

Before we go into those topics, it is very important to consider the tolerance of stale data for maximum optimization.
Lets say we cannot tolerate stale data at all – lets say the application is for selling a hotel room in Thai Bhat converted to Sterling. Now, depending on the rate I get from my bank and the date of stay, I get a price.
So every time anyone requests for a room, I have to check the price like
PriceInGBP=PriceInTHB* ( Select ExchangeRate where fromCurrency=THB and ToCurrency=GBP and ValidityFromDate<=12Jan and ValidityToDate >= 12Jan and isLive=True)
– Since I am selling in future so I have to look up rates on that date.
Now if I cannot tolerate stale data, the best I can do is to put a trigger on the exchange rate table to update a ExchangeRateVersionNumber Table. The exchange rate version number is updated on any change on the entire exchange rate table.
Thus my application changes to
If CachedExchangeRateVersionNumber = select * from ExchangeRateVersionNumber , used Cached exchange rate, Else the above query to fetch the exchange rate.
Here we see that the query on the database, hence load on the database is much smaller aiding scalabililty.

Read the rest of this entry »