January 3rd, 2008

The Coming War Over Data On The Web


If you dig beneath the surface of the brouhaha over Robert Scoble getting his Facebook account suspended for testing a new Plaxo Facebook app that mines user email addresses in violation of Facebook’s terms of service, you’ll find evidence of two increasingly apparent realities about the future of the web:

  1. Data is POWER
  2. A war will be fought over control of the data

War? Start with these astute observations by Nick Carr and Paul Buchheit:

Far from being just “his own information,” however, the information included the names, email addresses, and birthdays of 5,000 Facebookers who had “friended” Scoble. The act of “friending” on a social network site, it’s important to remember, is a fairly cavalier act, often undertaken with little thought.

Now, if you happen to be one of those “friends,” would you think of your name, email address, and birthday as being “Scoble’s data” or as being “my data.”

From Scoble: freedom fighter or data thief?

However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:

It was very helpful — I didn’t think that I would know anyone on Facebook, but it turns out that I knew hundreds of them.

However, Gmail’s Terms of Use seems to prohibit this:

“You also agree that you will not use any robot, spider, other automated device, or manual process to monitor or copy any content from the Service.”

From Should Gmail, Yahoo, and Hotmail block Facebook?

As Nick points out, the issue of who owns “the data” is quite a tangled web, if you’ll pardon the pun. On one level, there’s the issue of whether services like Facebook actually own your data because you agreed to that ownership in the Terms of Service. But even if you believe ideologically that each user owns their data, then you encounter the thorny issue of “data entanglement” — users have all of their friends’ data incorporated into their data profiles.

But Paul’s observation was what really made the battle lines visible. Each web service that lays claim to your data has a Terms of Service prohibition against you using other services to mine “your” data.

So what if Google or Yahoo, who lost the Facebook acquisition/investment lottery to Microsoft, decided to enforce this apparent TOS prohibition against using your Gmail or Yahoo mail as a basis for discovering friends on Facebook?

Whatever the result, the user would be caught in the TOS crossfire, without any apparent rights or recourse.

Shield SwordDave Winer also senses a war brewing, but casts it as a cyclical struggle that reflects the cycles of technology development:

It’s a big effin loop we’re in. One of these times around one of the companies that feels (incorrectly) that they have a lock on their users, will voluntarily give it up and be a leader in Generation N+1. I’ve never seen it happen, but in theory I think it could.

I think it’s unlikely we will see the cycle end any time soon — with the disintegration of distribution monopolies, the new power in media is in the data. That’s how Facebook got it’s $15 billion valuation — the potential to exploit its users’ data.

There was a big debate yesterday over Twitter’s business model, or lack thereof. Saul Hansell points out today that since Twitter doesn’t control the interface to its data, it can’t monetize that interface through traditional media monetization.

But Twitter does have one valuable asset that none of its interface or application partners can control — the database itself. If there’s a business model for Twitter, it’s in the data.

Is it a fair exchange for users to get a free service in exchange for giving up control of their data? And who is the arbiter when different services lay claim to control of the same user’s data?

Perhaps this struggle is the result of software application service providers seeking to monetize through traditional media business models, i.e. advertising.

Blame it on Google. But that won’t stop everyone from wanting to be Google.

  • The bottom line, which seems to be the Holy Grail of publishing 2.0 in general, is the answer to this question: who will pay for content or services? And will content and services start to die off if no one is willing to pay? Conventional logic would suggest that eventually these things will disappear if there are no funds to support them.

  • I already spent all morning drafting a comment for the article about this on sixaprt's blog. I will reiterate here:

    From my point of view, as a pretty avid user of social web apps, I think there are a couple of things missing from the way this discussion is playing out in most cases.
    1. By "friending" someone on MySpace, Facebook or wherever, you're agreeing to give them more access to you. Generally there are two levels of access - the non-'friend' level and the 'friend'-level. A user has control over what is published on each tier of access so it seems pretty obvious to me: if you don't want someone to have access to your email address or phone number, don't give it to them. Don't give them that access.
    Whether or not someone uses a script to manage the information you give them is beside the point. We don't hand people our business cards saying "You may only use this email by manually typing it. You can't put me in your bulk emailing lists."
    2. When you give someone your email address (or whatever), whether it's embedded in an image or not, you are trusting them to not abuse it.
    3. there are a few advantages to using Social Networking Services' messaging systems in lieu of regular email. One is the ability to communicate with people despite the fact that you haven't given them your email address, phone number, messaging handle or other private information. The other is the ability to 'block,' (and sometimes even flag) a user so that they actually lose privileges.

    As users of a networking service, I think making a distinction between running scripts or not, with regard to how I can use the information you gave me, is terribly naive. I mean maybe I shouldn't be able to use keyboard shortcuts for copy and paste either. It's a slippery slope. If we draw the line for how I can and cannot use data that you give me at using software, then how about this scenario: I just so happen to be fairly wealthy and I hire a whole room full of overseas workers to manually manage my contacts, send messages etc... See?

    Can I be trusted more because I'm not using a bot? No. Making the privilege of access to your contact information hinge on whether or not I will use software to help me organize it is a bit like saying I may only have your information if I promise to only use it in relatively more difficult ways.

    When I tell people about some of the work that's being done to create more universal data formats in the Semantic Web space, they often freak out about privacy, big brother and all that. It's like people believe that if everything is disorganized and harder to use, there is more safety, privacy etc. This is troubling to me. Thank goodness people don't manage their households and personal wealth with this approach to security!

    If we rely on disorganization as a layer of security It means that only those with greater access to more powerful tools (whether they're software tools or human resources) can extract and mine the data - data that's already intended to be public in the first place!

    Similarly, contact information should be managed via it's point of access, not how it's used. How it's used is a matter of trust and those of us with integrity have reasons to honor the privacy and comfort of our contacts.

  • Interesting related article/news release- Google and Facebook to cooperate over data:


  • TNM

    Data is certainly the underpinning value - but "information" is that which creates sustainable, manageable and communicable value and outcomes. By information, I mean the application of context and semantics to "raw" data (whether collected from users or sensors). The practice of "Information Management" is growing full-bore in the IT consulting industry, mainly as a means of deriving revenue from the myriad issues and opportunities associated with extracting (and protecting) value from a business or government's data. You'll find, in most current architecture models, the "information management" layer is more prominent than ever, and is given the same system-wide priority and governance framework that the "security" layer typically enjoys. Note the "information management" layer is not the "data management layer", which is the more commoditized capabilities around the basic "CRUD" model (create, review, update, delete) for raw data. Information Management is all about protecting and extracting both tacit and explicit value from the data, based to a large degree on collaborative agreement among the information stakeholders.

    So, yes, there's a coming war over the "oil" (i.e. data), but it will be played out among the "information stewards" and those with enough technical foresight and business acumen to recognize the myriad shapes and forms value can take when data is repackaged and leveraged as information.

  • The statement "Twitter...can’t monetize that interface through traditional media monetization." is interesting to me.

    I agree - no way this is gonna happen via TRADITIONAL models. But someone will figure out a way to do it.

blog comments powered by Disqus


Receive new posts by email

Recent Posts