After I asserted several times that data is the key to the future of the web, Umair Haque gave my head a good spin by asserting that data is in fact a commodity. Umair is half right — we are increasingly overrun by data, and SOME of it is a commodity. The commodity data is precisely what Google has harnessed, which makes Google so powerful — the data on the open web.

Google has perfected value creation from harvesting the open web, primarily text content and links between that content, which allows Google to evaluate and prioritize that text content. It is unlikely that anyone will beat Google at this game.

The data that is not a commodity (yet) is data that is NOT freely available on the web, e.g. the personal data we put on walled-garden social networking sites like Facebook.

It’s data about our lives that we choose to share only with friends and family, not with the whole world. It’s personal identification data, like birthday, address, and phone number, which we don’t want to share on the open web. It’s our searching habits, our purchase habits, our surfing habits, everything we do online.

But personal data is only one example of the larger category of data that is not shared on the open web. It’s the data that’s still in our heads, the data that we have not put in digital form.

Before social networking applications, this data included the story of our personal relationships — sites like Friendster, MySpace, and Facebook provided a way to capture that data for the first time. These services are so valuable because they became platforms for capturing data that even Google, with it’s army of spiders, couldn’t crawl, because it wasn’t online.

The same is true of Digg. Many Digg users have blogs, where their links to other sites can be read by Google. But before Digg, these people were linking in disconnected patterns. Then there were all the Digg users who did not have blogs or websites, and whose judgments about content on the web were not being captured. Digg, taking its cue from del.icio.us, gave these users a way to make their judgments about content they like explicit.

By capturing those judgments, and combining them, Digg harnessed a powerful data set that was beyond Google’s reach, because before Digg, it didn’t exist on the web — it was all in people’s heads.

If there’s value in Twitter, it’s that it puts on the web data that didn’t exist in digital form — granted its mostly data about people’s random thoughts, but Twitter’s opportunity is to figure out the value of harnessing that data.

Blogger and YouTube are also examples of applications that brought onto the web data that never existed digitally online before, whether a personal diary, a copyrighted video clip lifted from an old VHS tape, or list links to stuff that interests someone.

Why do you think Google bought Blogger and YouTube? Because they are platforms for putting data on the web that Google can harvest. As Umair puts it:

Think about it this way: the lower the cost of interaction, by definition, the more abundant data is – because every interaction creates reams of data. More data is created tomorrow than was created yesterday. And so on.

Umair is right that the power to bring new data on the web has become a commodity — Blogger, YouTube, MySpace, and Facebook created value by being pioneers, but these applications have become easy to replicated.

The power now is making that data useful. And Umair is right again that restricting access to data is not the key to value creation:

Success isn’t determined by how hard I can exclude you from scraping your data – but how effectively and efficiently I can help you share/use/reuse/hack/etc it.

But there’s a big caveat — the Scoble Facebook incident demonstrated that the challenge is to make data useful WITHOUT trampling on privacy.

I’ll offer one last important caveat to Umair, using his terms — not all “markets” and “communities” are creating data on the web, despite the extremely low cost to do so. This is a people problem, not a technology problem.

It’s the very human challenge of convincing various types of people, who have not been naturally inclined to use these now commodity web data platforms, to bring their data online. It’s creating networks out of people who are still disconnected in the networked age.

What may ultimately limit the growth of Google and open up opportunities for other players is that the future of the web will not be determined by companies that can overcome technologies challenges. Google was king of that era, but it may already be coming to an end.

The future of the web will be determined by companies that can overcome people challenges — to bring EVERYONE’S data online, and make it useful. And it won’t be about locking up people’s data, but instead helping them be smart about the free flow of their data.

It will be about networking that data, connecting it, to make a whole greater than the sum of the parts. That’s why web applications are so much more powerful than siloed desktop applications. That’s why the web itself is so powerful — it’s not just about collecting and distributing data. It’s about connecting data. And about connecting people.