Data is immortal, but not immune to decay
With cloud computing becoming commonplace in enterprise, we’ve come to accept that our data will be replicated and stored in duplicate.
Even data that is intentionally deleted can often be recovered. When Yahoo! purchased Geocities, nobody dreamed that it would go ahead and delete the entire archive – more than 600 gigabytes of internet history. Despite this, enthusiasts were able to quickly archive the collective work of 35 avid Geocities webmasters – an important milestone in our ability to breathe new life into data that someone else does not want.
The Cost of Deletion
Deleting data is not just a catastrophe for the user, or the business, or the system itself. Deletion of data also has a cost attached. We’ve all deleted files, essays, reports or emails by accident, and we’ve been forced to spend hours recreating what we lost. Other consequences also make data loss costly: loss of custom, loss of reputation, or damage to a brand.
Often, it is easier to harvest massive amounts of data, and create massive backups, than to be selective and economical. This means we have huge data silos just waiting to be used. That’s if they’re useful at all.
The Problem with Immortality
If data is immortal, surely we can just save it, back it up and move on? Why worry about data if it can take such good care of itself? Surely all we need to do is back it up on a regular basis?
Unfortunately, it’s not that simple. Data may be able to replicate itself and survive certain catastrophes, and we might be capable of creating lots of copies for a relatively cheap price.
But data cannot check its own validity or keep itself error-free. And the longer you keep a piece of data, the less useful it becomes.
The rate of data decay is estimated at about 2 percent per month. That doesn’t mean a lot on paper. In real terms, industry experts suggest that almost a quarter of contacts in a regular CRM will be out of date in a year.
Our own figures suggest that 42 per cent of failed CRM projects came off the rails because of the state of the data. It’s not that businesses don’t use the systems, or fail to adopt them enthusiastically. They just expect the data to live on, untouched, without any further maintenance.
Survival of the Fittest
Just because data is safe, and immortal, and secure, that does not necessarily mean that it is worth preserving. The world’s data centres are packed with old data that could be inaccessible, corrupted, duplicated or out of date – or all of the above.
Clean data does not occur by accident. We get clean data because we invest time in making it so. That means:
- Deduplicating records in a database so we’re sure we aren’t archiving more than one copy of the same thing
- Checking records against other databases that have already been cleansed (for example, checking addresses against an official Royal Mail postcode database)
- Ensuring old data is purged so that it does not ‘live forever’ and sully the database
Good From Bad
There are three main ways to obtain clean data from a dirty database.
- You assign a group of staff to contact every person in the database and check their records. Unless the business is a startup, or a niche organisation, this is very unlikely to be a practical or affordable course of action
- You use data quality software to automatically check for matches and mistakes, using sophisticated algorithms that can check millions of lines in minutes
- The whole dataset can be cleansed to a third party; when data purification is outsourced to a data bureau, the business can get on with its day to day work
Whichever route you chose, you must do something about your data. Simply saving it, copying it and archiving it is not going to ensure its quality and longevity.
Keeping Data Alive
Data storage technology has come on leaps and bounds in the last five years. From stone tablets and vellum, we’ve transformed the way we record the things we think, say and do. We can store hundreds of books on a tiny memory card, smaller than a postage stamp. And archives have saved the early days of the web from obliteration, ensuring that our first attempts at web design – MIDI files and all – are given their rightful place in history.
It’s possible to manually undelete data, or delete it and start again. If data goes bad, we can trash the whole lot and purchase a list from a third party. But for the best ROI, the best conversions and the best relationships with our customers, we should aim to keep our data clean, and ensure it’s relevant and useful for years to come.