January 5th, 2008

Data And The Future Of The Web


After I asserted several times that data is the key to the future of the web, Umair Haque gave my head a good spin by asserting that data is in fact a commodity. Umair is half right — we are increasingly overrun by data, and SOME of it is a commodity. The commodity data is precisely what Google has harnessed, which makes Google so powerful — the data on the open web.

Google has perfected value creation from harvesting the open web, primarily text content and links between that content, which allows Google to evaluate and prioritize that text content. It is unlikely that anyone will beat Google at this game.

The data that is not a commodity (yet) is data that is NOT freely available on the web, e.g. the personal data we put on walled-garden social networking sites like Facebook.

It’s data about our lives that we choose to share only with friends and family, not with the whole world. It’s personal identification data, like birthday, address, and phone number, which we don’t want to share on the open web. It’s our searching habits, our purchase habits, our surfing habits, everything we do online.

But personal data is only one example of the larger category of data that is not shared on the open web. It’s the data that’s still in our heads, the data that we have not put in digital form.

Before social networking applications, this data included the story of our personal relationships — sites like Friendster, MySpace, and Facebook provided a way to capture that data for the first time. These services are so valuable because they became platforms for capturing data that even Google, with it’s army of spiders, couldn’t crawl, because it wasn’t online.

The same is true of Digg. Many Digg users have blogs, where their links to other sites can be read by Google. But before Digg, these people were linking in disconnected patterns. Then there were all the Digg users who did not have blogs or websites, and whose judgments about content on the web were not being captured. Digg, taking its cue from del.icio.us, gave these users a way to make their judgments about content they like explicit.

By capturing those judgments, and combining them, Digg harnessed a powerful data set that was beyond Google’s reach, because before Digg, it didn’t exist on the web — it was all in people’s heads.

If there’s value in Twitter, it’s that it puts on the web data that didn’t exist in digital form — granted its mostly data about people’s random thoughts, but Twitter’s opportunity is to figure out the value of harnessing that data.

Blogger and YouTube are also examples of applications that brought onto the web data that never existed digitally online before, whether a personal diary, a copyrighted video clip lifted from an old VHS tape, or list links to stuff that interests someone.

Why do you think Google bought Blogger and YouTube? Because they are platforms for putting data on the web that Google can harvest. As Umair puts it:

Think about it this way: the lower the cost of interaction, by definition, the more abundant data is – because every interaction creates reams of data. More data is created tomorrow than was created yesterday. And so on.

Umair is right that the power to bring new data on the web has become a commodity — Blogger, YouTube, MySpace, and Facebook created value by being pioneers, but these applications have become easy to replicated.

The power now is making that data useful. And Umair is right again that restricting access to data is not the key to value creation:

Success isn’t determined by how hard I can exclude you from scraping your data – but how effectively and efficiently I can help you share/use/reuse/hack/etc it.

But there’s a big caveat — the Scoble Facebook incident demonstrated that the challenge is to make data useful WITHOUT trampling on privacy.

I’ll offer one last important caveat to Umair, using his terms — not all “markets” and “communities” are creating data on the web, despite the extremely low cost to do so. This is a people problem, not a technology problem.

It’s the very human challenge of convincing various types of people, who have not been naturally inclined to use these now commodity web data platforms, to bring their data online. It’s creating networks out of people who are still disconnected in the networked age.

What may ultimately limit the growth of Google and open up opportunities for other players is that the future of the web will not be determined by companies that can overcome technologies challenges. Google was king of that era, but it may already be coming to an end.

The future of the web will be determined by companies that can overcome people challenges — to bring EVERYONE’S data online, and make it useful. And it won’t be about locking up people’s data, but instead helping them be smart about the free flow of their data.

It will be about networking that data, connecting it, to make a whole greater than the sum of the parts. That’s why web applications are so much more powerful than siloed desktop applications. That’s why the web itself is so powerful — it’s not just about collecting and distributing data. It’s about connecting data. And about connecting people.

  • @Simon makes a great point. If you think about the current amount of customer data on Corporate websites that sits there idle not because it *couldn't* have a value but because the effort/complexity to create meaningful business value up to this point has not been enough of a driver to put significant effort against it - particularly as many corporations are struggling through their over burden of understanding and making actionable the actual customer data they already have.

    I suspect the Google/DoubleClick merger and what they end up doing with all that "data" will blow this entire debate wide open much more so than Facebook or closed networking sites ever could....as they will have a missing link between a gazillion publisher sites and the open

  • Don't confuse difficulty of getting data with the value of data. The difficulty (or cost) of getting the data does not reflect the value of data. Lets consider movie preferences. How ever you get the preference data (via a host cost method such as surveys or a lost cost method such as Facebook) on its own the preference data has no value. The value only comes from combining it with other data (such as demographic breakup of movie preferences) and then acting on the insights from the data combination.

    Facebook et al simply reduces the cost of gathering data. At this stage you can argue that is where the companies value lies. But note that is not value driven by data but value of the cost savings Facebook provides to data users. Kevin made the point about data aggregations the value they bring, but again this is not about the value of the data but simply the transactional cost savings for end users.

    Kevin noted that Umair was dismissing the value of insight gain from data. I don't think he is (but Umair can correct me if I am wrong). Insight and action on the insight is where the value lies. Umair is simply saying the underlying data is a commodity. I would go so far as to say it has no intrinsic value.

    Whether the data is used for strategic analysis or not does not grant value to the data. Its the insights and action taken on those insights where the value lies. Its worrisome that Yahoo needs a Chief Data Office to represent the value of data. Every single senior executive should be drawing insights from data that is relevant to them.

    When saying that data has no intrinsic value, that is not to say it is not important to what is being done. Rather, that the value arises not from having the data but from having the right data and combining that with the right other bits of data to produce insights and outcomes. Its the action and not the object that has value.

  • Is data commodity or strategic asset?
    For me the answer is simple - It depends on the "TYPE of data".

    User data is a strategic asset. There is no question that data is the foundation of any consumer internet service. Take yahoo as an example. They have a chief data officer that represents data as a strategic asset at the executive level.

    on the other hand, financial data of a public company is now commodity. Thanks to the digital revolution and the web.

  • Keeping in line with Umair's new definition of data, I think that it reaffirms whoever said (I believe it was Scott) that a social network isn't about who is on it, it's about who is NOT on it.

    The endeavor for networks/sites/blogs isn't to collect as much data as it can possible get its hands on but to collect data that it can DO something with. Like the house analogy, a foreman doesn't just pile up a stack of wood and try to build something out of it. He gets the specific wood that he needs.

    The old idea that you start niche and then mellow out so you can appeal to the middle might not work in that new paradigm.

  • Kevin,

    He's my caveat to Umair:

    Data on the open web becomes a commodity because it's abundant. Data that is not on the web yet has more value because it can't be leveraged by applications on the web, including Google.

    The value of Facebook is in its ability to bring data on the web.

    So while data is experiencing a commoditization, some data is still more valuable than other data.

blog comments powered by Disqus


Receive new posts by email

Recent Posts