Deleting Data


I see that Google has finally dumped all the data it collected on Wi-Fi during its Street View forays in Australia. Google claims that the collection of the data was a 'mistake' - presumably similar to the 'mistake' that caused Apple to collect location data on its iPhone users, and numerous other 'mistakes' by hi-tech firms violating their customer's privacy. I guess they've all been taking lessons from Facebook.

That seems to have sorted out the problems in Australia, but what about the data on the rest of the world? Do we have to take Google to court on a country by country basis before they destroy all of the data they eavesdropped on?

Which brings me to another point... Do you have any idea just how difficult it is to destroy modern commercial data that's been properly looked after? To put it bluntly it is well nigh impossible. Yes - impossible, that was what I said. This is one of IT's dirty little secrets. It's not difficult to eliminate the current version, but one of the purposes of backup is to recreate data that's been deleted. And the longer the data has been around, the more backup systems it will be on in one form or another.

Let me give you some very simplified example figures. Take just one server, and one information set, held for a year. This is customer information, so it's important and it's backed up weekly to tape. We are a year into this affair, so the info is, at a minimum, on 52 tapes for starters (the company learned the hard way not to re-use tapes for backup, and we won't even get into the complications of incremental backup here). Seems fairly simple though, just find the set on each tape and delete it. Technically it's not that easy, but we will assume that it's possible, and that there isn't a daily network backup.

But what about the copy of the set that the business analyst pulled into his spreadsheet? It's a reasonable assumption that he backs stuff up as well. I wonder if he can remember where all the backups for the last year are. Oh! And what about the copy of the spreadsheet he gave the CEO for that presentation, eight months ago. Even the CEO has been known to make backups on a sort of desultory basis.

There is also at least one copy on a USB stick that was used to take a copy to an overseas office. And then it turns out that someone thought the data set was really interesting - so they helpfully put a copy of it in the company's Dropbox folder. This was spotted immediately as a possible security problem and deleted. Now does anyone fancy explaining to Dropbox that the files need removing from -their- backup system?

That's just the files we know about. In the mean time, a load of other people probably looked through the data set, or at least a subset of the data. They've probably deleted it from their drives, and have forgotten that they ever had it, much less that it was backed up at some stage! And all this is about one set of data stored for just one year on one server...

As for data that people have stored on the cloud, you don't even want to go there! For instance, files on Amazon's cloud, are stored across five different disk clusters and can be reconstructed from any three of those clusters! They may also be stored to tape, I'm not sure about that, but I wouldn't be surprised.

And you want it all completely deleted so no one can ever get it back?

Alan Lenton
8 May, 2011

Coda June 2012: It turns out that Google didn't manage to delete all the copies of the data they 'inadvertently' collected - no big surprise there!


Read other articles about computers and society

Back to the Phlogiston Blue top page


If you have any questions or comments about the articles on my web site, click here to send me email.