Scheduling jobs in a web farm Or Clustered services
August 29, 2006
I had asked a question here about this topic. The answer I got really surprised me. I was told to consider Grid computing. Anyway, my thoughts on the topic are below.
Even though we are writing 3-tier applications, which are clustered and load balanced using hardware load balancers, we are often left wondering what to do about scheduled jobs and what to do about Long running jobs – which cannot be web pages.
The options that we face are:
1) Make them SQL Server Jobs. Typically databases are clustered, even if not, they must be available for the application to work. So it is better to tie the failure dependency to the database. The challenge that we face here is that it is really not advisable to have custom DLLs running on potentially shared database machine.
2) Make them EXEs and trigger by windows scheduler. The problem here is that windows scheduler is not cluster aware . If this needs to be done, we either need to live with manual failover of the jobs, or we need to schedule the job on multiple machines and implement some kind of locking possibly using database – to ensure that only one job runs at a time.
3) Look at a clustered scheduler – including windows cluster APIs in case you have an OS level clustering at the web/app server level. In my experience, it is rate to have a cluster at this level, but if there is one, you must be ready to exploit it. All OS clusters, including veritas, provide cluster services programming. You usually have two options : you can either make the job/service a part of the machine healthcheck. so if your job fails, the cluster fails over. Secondly, you can make the job run only on the primary node of the cluster. Possibly its the second option we are looking for. There are third party clustered schedulers available, mostly commercial.
4) Windows services: here again, we can take advantage of OS cluster services to make the service rum on primary node of the cluster only. Alternatively, we can code a lock at database level to make only one service active.
5) Grid computing APIs : Grid computing tools, acting as glorified schedulers, can ensure that the job runs once, successfully and only once.