20110121

Who Owns Your Data?

Karl Marx said that the industrial revolution polarized the world into two groups: those who own the means of production and those who work on them.Â

Today’s means of production aren’t greasy cogs and steam-spewing engines, but that doesn’t mean they don’t divide us. Industrial data is all around us, and search engines, governments, financial markets, social networks and law enforcement agencies rely on it.

We willingly embrace this "Big Data" world. We share, friend, check in and retweet our every move. We swipe loyalty cards and enter frequent flyer numbers. We leave a growing, and apparently innocent trail of digital breadcrumbs in our wake.

But as we use the Internet for "free," we have to remember that if we’re not paying for something, we're not the customer. We are in fact the product being sold -- or, more specifically, our data is.

So here’s a tricky question: Who owns all that data?

Why Data Ownership Is Hard

The fundamental problem with data ownership is that bits don’t behave like atoms. For most of human history, our laws have focused on physical assets that couldn’t be duplicated. The old truism “possession is nine-tenths of the law” doesn’t apply in a world where making a million copies, each as good as the original, is nearly effortless.
It’s not just the ability to copy that makes data different, however. How data is used affects its value. If I share a movie with someone, the copyright holder loses a potential sale. On the other hand, they may make money: freely sharing Monty Python videos online increased DVD sales by 23,000%. Some kinds of information were meant to be shared. If I give my phone number to someone, surely it’s gained value. But if it’s written on a bathroom wall, presumably it’s lost some.

It is hard to get data control right, too. In 2009, Burning Man required that Burners give organizers control of any of their images that were shared by a third party. The well-meaning effort to protect unwanted distribution sparked a vigorous debate about what electronic freedom really means. More recently, WikiLeaks has forced us to ask: Do thousands of leaked cables belong to the government, U.S. citizens, WikiLeaks or the newspapers that published them?

Old Laws, New Problems

These questions of data ownership are all nuanced issues, quick to anger and hard to resolve. We’re struggling to cope with them, both legislatively and culturally.

In a number of recent cases, outdated laws are being repurposed and abused, alternately defending and restricting freedom. We’re using ancient wiretapping laws to
imprison people who record law enforcement officials. At the same time that reading a spouse’s e-mail is a criminal offense, Google search history is now an admissible form of evidence. And the U.S. attorney general has just subpoenaed the private Twitter messages of a foreign citizen.

Big Data Makes Its Own Gravy

But if data law is confusing, Big Data makes it downright Byzantine. That’s because the act of collecting and analyzing massive amounts of public and private data actually generates more data, which is often as useful as the original information -- and belongs to whomever performed the analysis. Put another way:
Big Data makes its own gravy.

In August 2006, America Online published a dataset of search results, hoping to provide raw material to researchers. The data had been anonymized, so that each searcher’s identity was just a number. Five days later, The New York Times had tracked down one of those searchers by linking her search history to other public data, such as the phone book.

At big data companies, this kind of thing happens constantly; the companies just aren't ignorant enough to do it in public. Since 2006, our willingness to share data has risen dramatically. So has companies’ ability to mine it for new insights, and not always in ways we'd approve of. Consider the Netflix Prize, awarded for figuring out how to use our film preferences to suggest other movies. That kind of power could also be used for an "Insurance Challenge" that turns our online behavior into actuarial tables that dictate premiums and deny some of us coverage.

The Rise of the Data Marketplace

For decades, lawyers and traders have relied on companies like Thomson Reuters for the latest stock and legal news. Now startups like Gnip, Infochimps, Factual, and Datamarketplace (acquired by Infochimps) are making data easier to acquire and massage. These data marketplaces seldom create new data; rather, they clean it up, ensuring it’s current, and connecting buyers and suppliers. Their value comes not from the data, but from making it usable and accessible.
These data marketplaces can teach us an important lesson about data ownership. Ultimately, the question of who owns information is a red herring.

It’s Not Really About Who Owns the Data

Thirty years ago, Stewart Brand observed that, “On the one hand information wants to be expensive, because it's so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time.”

Data will leak out, as it always does, despite the best efforts of hardware companies. It’ll be around forever, even if we try to impose a statute of limitations on it. And we’ll find new ways analyze it, making still more data. Yesterday's online chaff may be the cornerstone of tomorrow's new startup.

The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis? Because that’s how, as Brand suggests, you get the right information in the right place. The digital divide isn’t about who owns data -- it’s about who can put that data to work.

No comments: