Notes on Archival

March 11, 2007

Had the pleasure of talking to Dr. Ram (Ramachandran Narayanaswamy) on a flight back from europe this week. Dr. Ram heads the storage vertical at MindTree  and we had very passionate opinions about content archival – and I expressed the same opinions which I had at this post on Apoorv’s blog Challenges of ECM 1.0 still not solved.

Dr. Ram needed some 90 year old records from the local council – and he was able to get that, a sheet of paper, hand written, still available and still understood. Consider this with records archived about 20 years ago. They will be on an 8 inch floppy drive. As he puts it – there are multiple dimensions of complexity here

1) You need to find the physical disk (Lets assume you do – after all you can still find a 90 year old paper)

2) Media should be in good quality (Lets assume  it is)

3) You need to have a drive to load it ( Lets assume you archived a drive along with the disk, every year)

4) You need to be able to physically connect it (do you need a yesteryear’s PC)

5) You need drivers for it (do you need a yesteryear’s PC/OS etc. to be archived as well?)

6) You need to make sure you are able to read files from the file system of that time and the encoding/characterset  of that time

7) You need a copy of wordstar to read the file, something which can run on current PCs/ OSes – or you should have a way to export data to currently readable formats – if you use an old machine / OS. Now will you need to print it and go for OCR ?

All this is too complex and too much work.  Something you might be able to do for information which is very valuable. If so – whats the point in archiving the rest ?

Now imagine the data in archive Databases -how much chances are there of the same making sense even if you are able to fulfull all the 7 conditions above ? What about images? Will JPEG be available tomorrow ? We are still able to see 100 year old photographs – but will we be able to see our albums – which are JPEG – just 20 years later ?

So whats the solution ?

Constant upgrade of archived content ? Isnt that too expensive and too much work as well?

His opinion was that any long term storage needs to be readable in Natural Language ( Including database data). Makes sense for a piece of paper – but for digital records ?? Well Dr. Ram said that we still have to find a solution – and that is one thing he will be looking at as a part of his and his teams work on storage.

Well, with the short sigtedness of an engineer and not a researcher – I look at it slightly differently. While there may be no solution today – I believe that if ever there is a solution – it will be for the most popular content format. Thus I will rather place my bets on HTML. Better still if I have a XML version of the data for these HTMLs. Also that the archive storage should be online – like a NAS and not offline. So when you upgrade your NAS – you automatically upgrade your data as well.

Dr. Ram and others working on this field of archival and storage – Please solve this soon and solve for documents first and database later :-)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: