January 3rd, 2008

The Coming War Over Data On The Web


If you dig beneath the surface of the brouhaha over Robert Scoble getting his Facebook account suspended for testing a new Plaxo Facebook app that mines user email addresses in violation of Facebook’s terms of service, you’ll find evidence of two increasingly apparent realities about the future of the web:

  1. Data is POWER
  2. A war will be fought over control of the data

War? Start with these astute observations by Nick Carr and Paul Buchheit:

Far from being just “his own information,” however, the information included the names, email addresses, and birthdays of 5,000 Facebookers who had “friended” Scoble. The act of “friending” on a social network site, it’s important to remember, is a fairly cavalier act, often undertaken with little thought.

Now, if you happen to be one of those “friends,” would you think of your name, email address, and birthday as being “Scoble’s data” or as being “my data.”

From Scoble: freedom fighter or data thief?

However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:

It was very helpful — I didn’t think that I would know anyone on Facebook, but it turns out that I knew hundreds of them.

However, Gmail’s Terms of Use seems to prohibit this:

“You also agree that you will not use any robot, spider, other automated device, or manual process to monitor or copy any content from the Service.”

From Should Gmail, Yahoo, and Hotmail block Facebook?

As Nick points out, the issue of who owns “the data” is quite a tangled web, if you’ll pardon the pun. On one level, there’s the issue of whether services like Facebook actually own your data because you agreed to that ownership in the Terms of Service. But even if you believe ideologically that each user owns their data, then you encounter the thorny issue of “data entanglement” — users have all of their friends’ data incorporated into their data profiles.

But Paul’s observation was what really made the battle lines visible. Each web service that lays claim to your data has a Terms of Service prohibition against you using other services to mine “your” data.

So what if Google or Yahoo, who lost the Facebook acquisition/investment lottery to Microsoft, decided to enforce this apparent TOS prohibition against using your Gmail or Yahoo mail as a basis for discovering friends on Facebook?

Whatever the result, the user would be caught in the TOS crossfire, without any apparent rights or recourse.

Shield SwordDave Winer also senses a war brewing, but casts it as a cyclical struggle that reflects the cycles of technology development:

It’s a big effin loop we’re in. One of these times around one of the companies that feels (incorrectly) that they have a lock on their users, will voluntarily give it up and be a leader in Generation N+1. I’ve never seen it happen, but in theory I think it could.

I think it’s unlikely we will see the cycle end any time soon — with the disintegration of distribution monopolies, the new power in media is in the data. That’s how Facebook got it’s $15 billion valuation — the potential to exploit its users’ data.

There was a big debate yesterday over Twitter’s business model, or lack thereof. Saul Hansell points out today that since Twitter doesn’t control the interface to its data, it can’t monetize that interface through traditional media monetization.

But Twitter does have one valuable asset that none of its interface or application partners can control — the database itself. If there’s a business model for Twitter, it’s in the data.

Is it a fair exchange for users to get a free service in exchange for giving up control of their data? And who is the arbiter when different services lay claim to control of the same user’s data?

Perhaps this struggle is the result of software application service providers seeking to monetize through traditional media business models, i.e. advertising.

Blame it on Google. But that won’t stop everyone from wanting to be Google.

Comments (25 Responses so far)

  1. Scott, you and I must have had one of those secret Vulcan mind-melds going on here again…

    I’d written:

    The Data Ownership Wars Are Heating Up

    It’s definitely an interesting time, and it should be interesting to see how things shake out in the coming year as services and users battle it out.

  2. Seems to me this just gets back to the same old situation – if you don’t want other people to get a hold of your info, don’t put it on the web.

  3. [...] the breaking down the walled gardens issue is really a data war as Scott Karp calls it rightly. It is about service providers trying to monetize user data and in [...]

  4. War implies mighty armies clashing. I think this will end up much more a free-for-all. Perhaps Scoble is going guerilla?
    Seriously; what I want is a ID key which opens the safety deposit boxes of my choice. And when I leave that deposit box, I take the whole key with me. You only get to access the data I choose to share while your application does something useful for me. Then I’m gone and you’re back to doing your best to keep the conversation going between us.
    Persistent conversation rather than owned/farmed/harvested data will win. The battle, for me, is not about owning data but about owning relationships.

  5. [...] [The coming war over data is a thorough post on data ownership] [...]

  6. [...] a little devious to me), there is an important issue at the centre of this Techmeme frenzy, as Scott Karp at Publishing 2.0 and others have pointed out: Who owns your [...]

  7. As I’ve written elsewhere, the issue is trust: when I friend someone on a social network, I give them access to certain data and I trust them not to misuse it. But misusing it can mean simply casually giving it away to any and all services – and I think this is what Robert is guilty of.

    To give another example of what I mean, consider the case of Quechup, which, as I found out, spams your Gmail address book contacts when you give it access, supposedly to find out if they’re already using the service. My suspicion is that 2008 and will see many more services like Quechup – and that will encourage users to flock to services which are less open, rather than more.

  8. How does the issue of inaccurate personal data affect this discussion?

    People put fake birthdates, names, and other bits of data in their profiles — some because they’re not comfortable putting personal facts on the Internet, others because they’re trying to appear younger, smarter, etc.

    I’ll guess that people are inconsistent with these lies. They may trust some sites more, and some social networks might include people who could call out false data. And some sites seem to demand that one create a special persona.

    The databases for trusted sites have to be more valuable, but they’re still never going to be 100% accurate. The ability to compare conflicting data from different networks would be extremely valuable — and obviously counter to all those terms of service agreements.

  9. The statement “Twitter…can’t monetize that interface through traditional media monetization.” is interesting to me.

    I agree – no way this is gonna happen via TRADITIONAL models. But someone will figure out a way to do it.

  10. [...] to get this information.  There is service after service dedicated to exactly this.  In fact, Scott Karp has called it a “War”.  I think of it more as an Arms Race.  It is straightforward technologically to defeat the measures [...]

  11. [...] The Coming War Over Data On The Web – Publishing 2.0 with the disintegration of distribution monopolies, the new power in media is in the data. That’s how Facebook got it’s $15 billion valuation — the potential to exploit its users’ data. (tags: trends 2008predictions facebook google yahoo) [...]

  12. [...] there’s a big caveat — the Scoble Facebook incident demonstrated that the challenge is to make data useful WITHOUT trampling on [...]

  13. Data is certainly the underpinning value – but “information” is that which creates sustainable, manageable and communicable value and outcomes. By information, I mean the application of context and semantics to “raw” data (whether collected from users or sensors). The practice of “Information Management” is growing full-bore in the IT consulting industry, mainly as a means of deriving revenue from the myriad issues and opportunities associated with extracting (and protecting) value from a business or government’s data. You’ll find, in most current architecture models, the “information management” layer is more prominent than ever, and is given the same system-wide priority and governance framework that the “security” layer typically enjoys. Note the “information management” layer is not the “data management layer”, which is the more commoditized capabilities around the basic “CRUD” model (create, review, update, delete) for raw data. Information Management is all about protecting and extracting both tacit and explicit value from the data, based to a large degree on collaborative agreement among the information stakeholders.

    So, yes, there’s a coming war over the “oil” (i.e. data), but it will be played out among the “information stewards” and those with enough technical foresight and business acumen to recognize the myriad shapes and forms value can take when data is repackaged and leveraged as information.

  14. [...] Karp asks whether “Facebook actually own your data because you agreed to that ownership in the Terms of [...]

  15. [...] avant l’escarmouche, Scot Karp affirmait que: “Les données c’est du pouvoir.” Il expliquait le prix de Facebook par [...]

  16. Interesting related article/news release- Google and Facebook to cooperate over data:


  17. [...] Bron: Publishing 2.0 [...]

  18. I already spent all morning drafting a comment for the article about this on sixaprt’s blog. I will reiterate here:

    From my point of view, as a pretty avid user of social web apps, I think there are a couple of things missing from the way this discussion is playing out in most cases.
    1. By “friending” someone on MySpace, Facebook or wherever, you’re agreeing to give them more access to you. Generally there are two levels of access – the non-‘friend’ level and the ‘friend’-level. A user has control over what is published on each tier of access so it seems pretty obvious to me: if you don’t want someone to have access to your email address or phone number, don’t give it to them. Don’t give them that access.
    Whether or not someone uses a script to manage the information you give them is beside the point. We don’t hand people our business cards saying “You may only use this email by manually typing it. You can’t put me in your bulk emailing lists.”
    2. When you give someone your email address (or whatever), whether it’s embedded in an image or not, you are trusting them to not abuse it.
    3. there are a few advantages to using Social Networking Services’ messaging systems in lieu of regular email. One is the ability to communicate with people despite the fact that you haven’t given them your email address, phone number, messaging handle or other private information. The other is the ability to ‘block,’ (and sometimes even flag) a user so that they actually lose privileges.

    As users of a networking service, I think making a distinction between running scripts or not, with regard to how I can use the information you gave me, is terribly naive. I mean maybe I shouldn’t be able to use keyboard shortcuts for copy and paste either. It’s a slippery slope. If we draw the line for how I can and cannot use data that you give me at using software, then how about this scenario: I just so happen to be fairly wealthy and I hire a whole room full of overseas workers to manually manage my contacts, send messages etc… See?

    Can I be trusted more because I’m not using a bot? No. Making the privilege of access to your contact information hinge on whether or not I will use software to help me organize it is a bit like saying I may only have your information if I promise to only use it in relatively more difficult ways.

    When I tell people about some of the work that’s being done to create more universal data formats in the Semantic Web space, they often freak out about privacy, big brother and all that. It’s like people believe that if everything is disorganized and harder to use, there is more safety, privacy etc. This is troubling to me. Thank goodness people don’t manage their households and personal wealth with this approach to security!

    If we rely on disorganization as a layer of security It means that only those with greater access to more powerful tools (whether they’re software tools or human resources) can extract and mine the data – data that’s already intended to be public in the first place!

    Similarly, contact information should be managed via it’s point of access, not how it’s used. How it’s used is a matter of trust and those of us with integrity have reasons to honor the privacy and comfort of our contacts.

  19. [...] Later, I decided to reuse this rant over on Publishing 2.0’s post here: “The Coming War Over Data On The Web” I’m linking to these things because I think you may want to go read them and the [...]

  20. [...] to think of it, with all this beating up on Facebook over data portability, and the glorifying of Twitter’s API, where on Twitter can I, as regular (non-API) user, [...]

  21. The bottom line, which seems to be the Holy Grail of publishing 2.0 in general, is the answer to this question: who will pay for content or services? And will content and services start to die off if no one is willing to pay? Conventional logic would suggest that eventually these things will disappear if there are no funds to support them.

  22. [...] 25, 2008 · No Comments After reading and digesting Publishing 2.o’s Scott Karp’s comprehensive piece on the forthcoming war over data, it seems to me there is an obvious component to the “debate” of data ownership that should be [...]

  23. [...] there’s a big caveat — the Scoble Facebook incident demonstrated that the challenge is to make data useful WITHOUT trampling on [...]

  24. [...] The Coming War Over Data On The Web [via Zemanta] [...]

  25. [...] reading and digesting Publishing 2.o’s Scott Karp’s comprehensive piece on the forthcoming war over data, it seems to me there is an obvious component to the “debate” of data ownership that should be [...]

Add Your Comment


Receive new posts by email