Fragmentation III

Ok, I didn’t mean to talk so much about data fragmentation in the Internet age, but it has been an issue that has bothered me for a while and it doesn’t seem to be getting better. The third (and perhaps final) topic that I wanted to cover was document/media storage. The prior two articles were essentially about messaging, but now I want to talk about the places where you store your documents, your music, and your movies.

Just to start off with the example of my own data, I own 3 computers: a laptop, a desktop, and a home server. I also have a smart phone and a set-top box that holds movies. I have a separate work computer at work, and I have a couple different web hosts. Those are a lot of different places where I might store data. Mostly there’s good reason; I keep my work data on my work computer and my personal data on my home computer, and I want to keep them separate. On the other hand, I have a set of documents that I want to have accessible on my laptop and desktop, backed up on my home server, and maybe backed up online. That sounds simple enough, right?

It’s not. It should be simple, it’s almost simple, but there are always niggling little details. I won’t go into all the technical details of all the solutions I’ve tried and why they don’t quite work. Yes, there are workable solutions (rsync, unison, Dropbox, or working directly on a server), but I believe this problem really needs to be solved comprehensively for everyday computer users.

Some people will of course suggest “thin clients” or “cloud computing” or “Internet operating systems”, but I think all of these solutions have real problems, one of the big problems being that if the server goes down, everyone on that server is suddenly unable to work. People will counter by saying, “well you just distribute it across a bunch of servers so there’s no more single point of failure.” It’s harder than it sounds, and even if you accomplish that, what if the client’s Internet connection goes down.

One of the things it’s important to keep in mind is how cheap computing has gotten. We have more computing power in our cell phones than existed in the biggest computers a few decades ago, and we’re putting hundreds of gigabytes into USB thumb drives. It’s ultimately not going to save you much money to forgo internal storage and computing power for a thin client, so people are usually going to get a thicker client anyway. Once you have that internal storage and processing power, you may as well use it.

I think ultimately there are two solutions. The first would be to try to come up with elaborate syncing technologies which will enable you to always have all your data stored on the server but cached locally. This could have some nice effects. Imagine you open a word document on your desktop computer which, for all intents and purposes, appears to be stored locally on your hard drive. You start typing, and every change is immediately synced to an online server where it can be viewed and edited in a service like Google Docs. Meanwhile the changes are also being downloaded to each of your authenticated devices, including your cell phone and your laptop. If changes are being made through one of the other devices, you get something like the “collaborative real-time editing” in SubEthaEdit. Or imagine that when you take a picture on your cell phone, it was automatically uploaded to a service like Picasa, which in turn synced to it all of your computers. If you add a picture to your computer, it syncs back to your phone. We have the pieces for a solution like this, but it’s not comprehensive or well integrated as what we need.

The other solution is to go the other way and to try to put all of your data in a single place. It’s a pretty simple idea: put a big enough storage device in your cell phone so that it can hold your entire home directory. Put a standard dock on every computer you use, and set the computers up so that they’ll automatically mount the home directory during login. Of course, that can still get a little tricky if you’re using different platforms, and I don’t know of any platform that supports this use very well. Also, you’re going to want that data backed up, and you’re probably going to want to get some of that information online sooner or later, so it wouldn’t completely settle things.

On top of the rest of these things, there’s another detail that I think about now and then: all of us are putting a lot of money and effort into storing the same information over and over again. For example, I have the Radiohead’s album “The Bends” stored on my computer in AAC format. It’s also stored on my home server and my laptop, as well as my iPod. It’s also probably stored on millions of other computers around the world, but I still need to store myself and back it up, because if I ever lost those files, then I couldn’t get it back. If I went back to iTunes and asked to download it again, they wouldn’t let me. If I tried downloading it from another source, I’d be accused of being a pirate.

The vast majority of the data on my computer is like that. All my documents and pictures take up a couple of gigabytes, and the rest is copyrighted material that probably exists on lots of other computers around the world. Still, I have to concern myself with backing up the copies on my computer as though I have a unique copy. In some ways, this is a very uninteresting problem, but it raises a question in my mind: what if we looked at issues like data storage and backup for a society as a whole rather than on the individual level? All of Radiohead’s “The Bends” probably takes up less than 100 megabytes on my computer, but just by myself I have it copied on 5 different devices, not including backups to external hard drive or DVD. That’s half a gigabyte right there. How much storage do you think is taken up worldwide, storing just that one album? How many terabytes? Are we making efficient use of our time, effort, and resources?

Honestly, I’m not sure how much it matters. I believe that there are probably better ways to deal with some of these issues, but copyrights and proprietary intests will probably prevent significant improvements from being made in the foreseeable future.

That’s all for now. Thanks for reading.