January 5th, 2008

Data And The Future Of The Web

by

After I asserted several times that data is the key to the future of the web, Umair Haque gave my head a good spin by asserting that data is in fact a commodity. Umair is half right — we are increasingly overrun by data, and SOME of it is a commodity. The commodity data is precisely what Google has harnessed, which makes Google so powerful — the data on the open web.

Google has perfected value creation from harvesting the open web, primarily text content and links between that content, which allows Google to evaluate and prioritize that text content. It is unlikely that anyone will beat Google at this game.

The data that is not a commodity (yet) is data that is NOT freely available on the web, e.g. the personal data we put on walled-garden social networking sites like Facebook.

It’s data about our lives that we choose to share only with friends and family, not with the whole world. It’s personal identification data, like birthday, address, and phone number, which we don’t want to share on the open web. It’s our searching habits, our purchase habits, our surfing habits, everything we do online.

But personal data is only one example of the larger category of data that is not shared on the open web. It’s the data that’s still in our heads, the data that we have not put in digital form.

Before social networking applications, this data included the story of our personal relationships — sites like Friendster, MySpace, and Facebook provided a way to capture that data for the first time. These services are so valuable because they became platforms for capturing data that even Google, with it’s army of spiders, couldn’t crawl, because it wasn’t online.

The same is true of Digg. Many Digg users have blogs, where their links to other sites can be read by Google. But before Digg, these people were linking in disconnected patterns. Then there were all the Digg users who did not have blogs or websites, and whose judgments about content on the web were not being captured. Digg, taking its cue from del.icio.us, gave these users a way to make their judgments about content they like explicit.

By capturing those judgments, and combining them, Digg harnessed a powerful data set that was beyond Google’s reach, because before Digg, it didn’t exist on the web — it was all in people’s heads.

If there’s value in Twitter, it’s that it puts on the web data that didn’t exist in digital form — granted its mostly data about people’s random thoughts, but Twitter’s opportunity is to figure out the value of harnessing that data.

Blogger and YouTube are also examples of applications that brought onto the web data that never existed digitally online before, whether a personal diary, a copyrighted video clip lifted from an old VHS tape, or list links to stuff that interests someone.

Why do you think Google bought Blogger and YouTube? Because they are platforms for putting data on the web that Google can harvest. As Umair puts it:

Think about it this way: the lower the cost of interaction, by definition, the more abundant data is – because every interaction creates reams of data. More data is created tomorrow than was created yesterday. And so on.

Umair is right that the power to bring new data on the web has become a commodity — Blogger, YouTube, MySpace, and Facebook created value by being pioneers, but these applications have become easy to replicated.

The power now is making that data useful. And Umair is right again that restricting access to data is not the key to value creation:

Success isn’t determined by how hard I can exclude you from scraping your data – but how effectively and efficiently I can help you share/use/reuse/hack/etc it.

But there’s a big caveat — the Scoble Facebook incident demonstrated that the challenge is to make data useful WITHOUT trampling on privacy.

I’ll offer one last important caveat to Umair, using his terms — not all “markets” and “communities” are creating data on the web, despite the extremely low cost to do so. This is a people problem, not a technology problem.

It’s the very human challenge of convincing various types of people, who have not been naturally inclined to use these now commodity web data platforms, to bring their data online. It’s creating networks out of people who are still disconnected in the networked age.

What may ultimately limit the growth of Google and open up opportunities for other players is that the future of the web will not be determined by companies that can overcome technologies challenges. Google was king of that era, but it may already be coming to an end.

The future of the web will be determined by companies that can overcome people challenges — to bring EVERYONE’S data online, and make it useful. And it won’t be about locking up people’s data, but instead helping them be smart about the free flow of their data.

It will be about networking that data, connecting it, to make a whole greater than the sum of the parts. That’s why web applications are so much more powerful than siloed desktop applications. That’s why the web itself is so powerful — it’s not just about collecting and distributing data. It’s about connecting data. And about connecting people.

Comments (22 Responses so far)

  1. Another good post here. You’ve been on a roll lately and it seems that you and Matthew Ingram are a good virtual tag team. Makes for good reading.

  2. Scott, I’ve been reading your blog. Good posts and you’re never shy to talk about the tough topics.

    However on this one Umair all wet data is *not* a commodity. Umair has to take his head out of his “Porter five forces” text book and smell the coffee. I like Umair’s analysis but here he’s way off base.

    Data is not a commodity. Ask Google why they love their toolbar and love their data from those hundreds of thousands of servers. Do you think they can sell it to me? Oh wait its a commodity so I’ll just duplicate their data.. not.

    Keep posting

  3. In general I agree with the thrust of what Umair is saying.

    Consider the following reasoning. Data is a commodity in the sense that the initial value of the data does not represent the final value of what can be done with the data. Consider a house. The value of the wood etc in the house is not the same as the final value of the house. The final value of the property is determined from the utility of the house to prospective occupiers.

    Data has no intrinsic value in and of itself. The value only arises from the combination with other data and what you can do with it. Which dovetails with the concept that freer for data to be re-combined with other data the more value can be created. As you mentioned Scott, the web is progressing to the creation of ecosystems to re-combine data in new and a different ways. In which value creation is limited solely be the ingenuity of people to re-combine data from all sorts of sources even other data ecosystems.

    In John Furrier’s example of Google Toolbar data, the data still has no intrinsic value. The value derives from what the Googlers can do with the data. What conclusions and knowledge that draw from analysing the data in context. Difficultly of getting data should not be confused for value. Metals are still a commodity despite how difficult it may be to extract the metal from various environments.

  4. Nicely done Post !

  5. I’ve read your post a couple of times, and I’ll be darned if I can see how you aren’t making the exact same point that Umair makes.

    Think about it this way: the lower the cost of interaction, by definition, the more abundant data is – because every interaction creates reams of data. More data is created tomorrow than was created yesterday. And so on.

    What is valuable are the things that create data: markets, networks, and communities.

  6. We have allowed data to be perceived as a commodity because in general we have been willing to have a value exchange between the product and/or service and our data.

    However, the medium itself is immature. We are only at the beginning of our understanding of the relationship we have with/to our OWN data. As our connection to our selves becomes more exposed and transparent I expect our perceptions of our data usage will mature – and by consequence our willingness (or not) to be used as tool for other people’s monetary gains will morph and change…

  7. Great post Scott. I enjoy Umair’s thoughts quite a bit, but he had me scratching my head with this post.

    Consider a house. The value of the wood etc in the house is not the same as the final value of the house. The final value of the property is determined from the utility of the house to prospective occupiers.

    Simon, this is an apt analogy. Even if data itself can be viewed as a commodity (and just as with
    housing materials, some data is more commodity than other data), it’s still an indispensable element and building block of a whole lot of valuable objectives.

    Umair’s position seems to forget that there’s been a whole lot of investment in extracting insight from reams of data by those who possess it, and this provides the data aggregation companies an incredible market advantage.

    The question of “who owns the data” seems as silly as asking who owns two copies of the same mp3. Once information is reduced to bits and shared, “ownership” is moot, and value extraction becomes the only meaningful criteria.

  8. [...] post by Scott Karp and software by Elliott Back This entry is filed under Data save. You can follow any responses to [...]

  9. Creative expression is a type of personal data that’s not a commodity. In fact it is one of the most prized forms of web-based data because it is a major traffic driver.

  10. Kevin,

    He’s my caveat to Umair:

    Data on the open web becomes a commodity because it’s abundant. Data that is not on the web yet has more value because it can’t be leveraged by applications on the web, including Google.

    The value of Facebook is in its ability to bring data on the web.

    So while data is experiencing a commoditization, some data is still more valuable than other data.

  11. Keeping in line with Umair’s new definition of data, I think that it reaffirms whoever said (I believe it was Scott) that a social network isn’t about who is on it, it’s about who is NOT on it.

    The endeavor for networks/sites/blogs isn’t to collect as much data as it can possible get its hands on but to collect data that it can DO something with. Like the house analogy, a foreman doesn’t just pile up a stack of wood and try to build something out of it. He gets the specific wood that he needs.

    The old idea that you start niche and then mellow out so you can appeal to the middle might not work in that new paradigm.

  12. Is data commodity or strategic asset?
    For me the answer is simple – It depends on the “TYPE of data”.

    User data is a strategic asset. There is no question that data is the foundation of any consumer internet service. Take yahoo as an example. They have a chief data officer that represents data as a strategic asset at the executive level.

    on the other hand, financial data of a public company is now commodity. Thanks to the digital revolution and the web.

  13. [...] Scott Karp penned up a great article everyone interested in internet technology should read — Data And The Future Of The Web. [...]

  14. Don’t confuse difficulty of getting data with the value of data. The difficulty (or cost) of getting the data does not reflect the value of data. Lets consider movie preferences. How ever you get the preference data (via a host cost method such as surveys or a lost cost method such as Facebook) on its own the preference data has no value. The value only comes from combining it with other data (such as demographic breakup of movie preferences) and then acting on the insights from the data combination.

    Facebook et al simply reduces the cost of gathering data. At this stage you can argue that is where the companies value lies. But note that is not value driven by data but value of the cost savings Facebook provides to data users. Kevin made the point about data aggregations the value they bring, but again this is not about the value of the data but simply the transactional cost savings for end users.

    Kevin noted that Umair was dismissing the value of insight gain from data. I don’t think he is (but Umair can correct me if I am wrong). Insight and action on the insight is where the value lies. Umair is simply saying the underlying data is a commodity. I would go so far as to say it has no intrinsic value.

    Whether the data is used for strategic analysis or not does not grant value to the data. Its the insights and action taken on those insights where the value lies. Its worrisome that Yahoo needs a Chief Data Office to represent the value of data. Every single senior executive should be drawing insights from data that is relevant to them.

    When saying that data has no intrinsic value, that is not to say it is not important to what is being done. Rather, that the value arises not from having the data but from having the right data and combining that with the right other bits of data to produce insights and outcomes. Its the action and not the object that has value.

  15. @Simon makes a great point. If you think about the current amount of customer data on Corporate websites that sits there idle not because it *couldn’t* have a value but because the effort/complexity to create meaningful business value up to this point has not been enough of a driver to put significant effort against it – particularly as many corporations are struggling through their over burden of understanding and making actionable the actual customer data they already have.

    I suspect the Google/DoubleClick merger and what they end up doing with all that “data” will blow this entire debate wide open much more so than Facebook or closed networking sites ever could….as they will have a missing link between a gazillion publisher sites and the open
    web.

  16. [...] Yahoo has the assets to leverage a strong position in the new ecosystem of the global web – hello data debate..  Lets hope that their product and competitive strategy doesn’t fall short.  They have my [...]

  17. [...] Scott Karp wrote an interesting post on data a couple of days ago. Before social networking applications, this data included the story of our personal relationships — sites like Friendster, MySpace, and Facebook provided a way to capture that data for the first time. These services are so valuable because they became platforms for capturing data that even Google, with it’s army of spiders, couldn’t crawl, because it wasn’t online. [...]

  18. [...] you are a developer of web content sites, then you must read Data and the Future of the Web by Scott Karp and Database Gods Bitch About MapReduce by Rich Skrenta. Scott provides the vision of [...]

  19. [...] was a great post and debate about data being a commodity on bubblegenration and publishing 2.o last [...]

  20. [...] Data And The Future Of The Web – Publishing 2.0 “The future of the web will be determined by companies that can overcome people challenges — to bring EVERYONE’S data online, and make it useful. And it won’t be about locking up people’s data, but instead helping them be smart about the free flow (tags: socialnetworking privacy social web2.0) [...]

  21. [...] And The Future Of The Web 2 février 2008 Source : Article de Publishing 2.0 After I asserted several times that data is the key to the future of the web, Umair Haque gave my [...]

  22. [...] out there. Users will embrace it and it can only help with business. Publishing 2.0 put it well in an article about data: The future of the web will be determined by companies that can overcome people challenges — to [...]

Add Your Comment

Subscribe

Receive new posts by email