July 22nd, 2006

The Fundamental Problem of Invalid (Fraudulent) Clicks

by Scott Karp

  •  View Comments

An NYU professor conducted an independent analysis of Google’s efforts to combat click fraud and found that, while Googe’s efforts are “reasonable,” pay-per-click advertising “does not offer any ‘built-in’ fundamental protection mechanisms against the click fraud since it is very hard to specify which clicks are valid vs. invalid in general” and that any particular advertiser can be “hurt badly by fraudulent attacks.”

According to the official Google blog:

As part of the settlement in the click-fraud case Lane’s Gifts v. Google, we agreed with the plaintiffs to have an independent expert examine our detection methods, policies, practices, and procedures and make a determination of whether or not we had implemented reasonable measures to protect all of our advertisers.

What’s fascinating is that the Google blog crows about the report’s findings because Google’s efforts to combat click fraud were deemed “reasonable,” which probably has positive legal ramifications for Google — but the report (available here) is at the same time a damning indictment of pay-per-click advertising.

The report’s author, Dr. Alexander Tuzhilin, a professor of information systems at NYU (my alma mater), offers two possible solutions to what he calls “the fundamental problem of invalid (fraudlent) clicks”:

• The “trust us” approach of the search engines. The search engines can assure advertisers that they are doing everything possible to protect them against the click fraud. This is not easy because of the inherent conflict of interest between the two parties: the money from invalid clicks directly contribute to the bottom lines of the search engines. Nevertheless, it may be possible for the search engines to solve this trust problem by developing lasting relationships with the advertisers. However, the discussion of how this can be done lies outside of the scope of this report.

• Third-party auditors. Independent third-party vendors, who have no financial conflicts of interest, can work with advertisers and audit their clickstream files to detect invalid clicks.

These two approaches would still constitute only a partial solution to the Fundamental Problem because there is no conceptual definition of invalid clicks that can be operationalized.

Will Google invite independent auditors under their tent? Ha! They’d sooner put pop-up ads on Google.com, so it looks like the only alternative is the “trust us” method, rife as it is with conflicts of interest.

But the real indictment of pay-per-click and Google, which Tuzhilin ties to an inability to “operationalize a conceptual definition of invalid clicks,” lies in this Catch 22:

An operational definition cannot be fully disclosed to the general public because of the concerns that unethical users will take advantage of it, which may lead to a massive click fraud. However, if it is not disclosed, advertisers cannot verify or even dispute why they have been charged for certain clicks.

That plus the following are the coup de grace:

Finally, the measures (1) – (6) above are only statistical measures providing some evidence that Google’s filters work reasonably well. This does not mean, however, that any particular advertiser cannot be hurt badly by fraudulent attacks, given the evidence that Google filters “work.” Since Google has a very large number of advertisers, one particular bad incident will be lost in the overall statistics. Good performance measures indicative that filters work well only mean that there will be “relatively few” such bad cases. Therefore, any reports published in the business press about particular advertisers being hurt by particular fraudulent attacks do not mean that the phenomenon is widespread. One simply should not generalize such incidents to other cases and draw premature conclusions – we simply do not have evidence for or against this.

Translation — while it is not likely that a significant percentage of advertisers are being harmed by click fraud, it is entirely possible that some number of advertisers are being massively harmed. The lack of evidence cuts both ways — so advertiser beware!

This report should dispel any doubt that cost-per-click needs to transition to cost-per-conversion. The real question is whether Google can leverage its scale to pull it off — or whether $6 billion+ in cost-per-click revenue will prove to be too great a liability — especially with Wall Street’s stratospheric expectations for the continued doubling of profits.

Post to Twitter Tweet This Post  Post to Facebook Share on Facebook

  • If the definition of invalid clicks is ones that don’t contribute to ROI, then the argument that some have put forth saying that click fraud improves ROI (because it lowers prices) doesn't make any sense.
  • Surely the definition of invalid clicks is ones that don't contribute to ROI. Competition to reduce clickfraud is a competitive advantage.
  • Actually, if any other engines or networks post the number of "invalid clicks," that doesn't necessarily show which system is more effective against fraud. It shows how good each one is at reporting what it considers to be "invalid clicks."

    If there was a universally-accepted definition of "invalid click," one could make useful comparisons (as a %age of total clicks), but that doesn't help you tell which clicks actually are invalid (or even fraudulent) but not marked as so (because they aren't caught by the filters, learning algorithms, etc).
  • BTW Google is now displaying the number of invalid clicks on Adsense reports. If Yahoo does the same then advertisers will be able to see which system is more effective against fraud and shift budget accordingly.

    http://adwords.blogspot.com/2006/07/estimating-invalid-clicks.html
  • Mark,

    Maybe, unless you're Google and you have unrivaled reach chaining advertisers to your network, and the revenue from fraud may exceed lost revenue from lost advertisers.
  • Scott,

    Don't you think that the companies offering CPC have an inbuilt incentive to stop click fraud? Surely ads taken on networks with higher click fraud will have a lower performance, causing advertisers to switch to networks with lower fraud.
  • john
    The report is a snow job. If you look at the click stream as a whole, it may appear that CF is "under control" but with no correlation to revenue (80/20 rule) its deceiving. 80% of the fraud will happen on 20% of the advertisers, pull out the top 20% of KW earners and CF will goes off the chart. Fraudsters aren't targetting "used husband" but are very busy in other areas.

    The expert basically looked at what google told him to look at but missed the picture.
  • It should be pointed out that even though CPC may not go away, it may become less attractive to all but the deepest-pocketed advertisers. The engines and networks will have to step up their efforts to contain click fraud (we now know they cannot stop it altogether). This will cost money, which will be passed on to the advertisers (because otherwise, it would come out of profits and cause valuations to drop). So the CPC advertisers may see declining ROI as time goes on.
  • I am actually a fan of Google and Adwords, Adsense, and a number of other products that they've ingeniously developed. My single largest concern with click fraud is that Google profits from it. Take all else away, and that's what you are left with. I do not believe that Google knowingly schemed to develop a solution where they would profit from criminal behavior. However, that's not an excuse now that they are aware that the fraudulant use exists.

    eBay is another giant that does not take responsibility for fraud on its site... yet they do not refund knowingly stolen commissions back to the buyer.

    In both cases, I do not understand why this isn't a simple prosecution of knowingly being in receipt of stolen goods. If I rented my neighbor a car and they used that car in the commission of a crime, wouldn't I be convicted?

    Google and eBay both know that the fraud exists, so aren't they knowingly:
    a. Providing a means for someone to commit fraud?
    b. Profiting from that fraud?

    Just curious.
    Doug
  • We recently ran a test on a couple of the web sites that we are
    involved in supporting. On one site that I'll use as an example,
    there were seven ads that were self-hosted on the site by the
    publisher during April, in addition to other ads on the site which were
    hosted by third parties (primarily AdSense and Yahoo Publisher). For
    the ads that were self-hosted, when someone clicked on one of the ads,
    it went through a redirect on the site so the publisher could log the
    click and count the clickthroughs.

    The self-hosted ads on these sites are not sold on a pay-per-click
    (PPC) or even a CPM basis... rather, the advertiser pays a fixed rate
    for their ad to be hosted on the site for a fixed period. Still, the
    publisher feels it is important to provide accurate reporting to the
    advertisers of their actual exposure and clickthrough results on
    their ads.

    During April 2006, we registered 3675 ad clicks on the 7 ads on the site.
    This site has approximately 100,000 visitors a month.

    Then the publisher made an attempt to filter out automated robots and
    spiders. One can easily identify many of them by their accessing of a
    robots.txt file or by the information contained in the browser ID
    that they supply to the access log.

    Then we looked through the clickthroughs remaining in the access log
    after this filtering. There remained a large number of clickthroughs
    in the log file that still looked suspicious, because they showed
    clickthroughs (sometimes multiple clickthroughs) of all 7 ads in a
    short period of time from the same IP address, and further
    examination shows that accesses to web pages from these IP addresses
    never download any of the images associated with those pages. These
    clickthroughs are almost certainly from robots or spiders as well,
    and their number was quite significant.

    In total, we found and removed hundreds of IPs and bots. After
    filtering out everything that looked very robot or spider-like, the
    number of ad clickthroughs for the month went from 3675 to 169.

    On a second very different web site (with about 10,000 visitors a
    month), going through this process reduced the number of ad
    clickthroughs for the month from 2181 to 210.

    It is quite possible that we removed a few real clicks by humans in
    this process, but I would guess the number of false removals is quite
    small. It is also even more probable that our process missed some of
    the automated bots and spiders. My guess is that on balance we are
    now fairly close to the real number of human clicks... possibly over
    reporting by no more than 20%.

    Similar filtering is necessary if one is to provide accurate
    reporting on page views, unique visitors, and average session times
    on a site.

    The process of identifying these bots and spiders is rather time
    consuming. For many of them, looking at a single hit in the log file
    does not provide any indication that the access is not by a human.
    Only by noticing that images are never downloaded and that all the
    ads were clicked over a short period of time were we able to identify
    them.

    I think that for fair reporting to advertisers, it is also important
    to identify and remove multiple clicks coming from the same IP
    address over a short period of time. We didn't see any patterns that
    looked like deliberate click fraud, but still it is clear that there
    were many instances where a human visitor clicks the same ad link two
    or three times over a short period of time. The numbers I reported
    above did not filter for this, and would likely be another 25%
    smaller if we did.

    The bottom line is that on the order of 90% to 97% of the ad clicks
    registered on the sites should not be reported as human clicks...
    unique human clicks represent less only 3% to 10% of the number of ad
    clickthroughs in the log file.

    We also looked at clickthroughs for ads sent out in the email
    newsletter of one of these sites. Not surprisingly, we almost never
    see clicks that look like they are coming from robots or spiders for
    these ads. The issue of dealing with multiple clicks from the same
    reader however still is relevant.

    I'm also guessing that Web page ads which are generated from
    Javascript code (like Google AdSense and Yahoo Publisher ads) are
    probably not as likely to be "clicked on" by spiders and robots. Try
    this test... go to http://www.o-a.com and then look at the source
    code for the page in your browser. You can see the hyperlinks for
    all the self-hosted ads, but for the Yahoo Publisher and AdJungle
    hosted ads, you don't see the links in the HTML, all you see is the
    Javascript code. That means that a robot or spider isn't going to
    see those links and click on them either, unless it is also
    executing the Javascript code.

    Now go to Google and do a search on the phrase "online advertising"
    and again look at the source code for the resulting page. Here is the
    direct link to it:
    http://www.google.com/search?hl=en&lr=&q=%22online+advertising%22&btnG=Search

    The ads and search results on this page are not Javascript generated,
    so if a bot or spider follows the above hyperlink, it might SEE the
    links and potentially click on them. But also consider that this is
    entirely a user generated page based on my search. Unless I place
    that hyperlink onto the web somewhere, no spider will find it.

    Cliff Kurtzman
    Moderator
    Online Advertising Discussion List
    http://www.o-a.com
  • Google will never got CPA and ditch PPC completely.

    Why on earth would any company be stupid enough to give google a complete breakdown of what keywords produce what ROI? Maybe yahoo personals should provide that info to google so that google can clone their personals section?
  • ted
    I am also looking forward to more CPA models, but in truth, at least in my business, I already have a measure of CPA given Google's reporting tools. I know exactly how much it costs me to sell products and acquire new users. Whether this number is 20% higher because of fraud (unlikely as I only advertise on the search network and not the content network) is annoying, but I'm still able to evaluate the value of the campaign based on CPA and not CPC. CPC is irrelevent to me as long as I'm making the extremely high margins that the business currently enjoys.
  • Scott,

    Nice anaysis and just one more reason to distrust/hate Google, IMHO. There are more reasons every day.

    I think one form of click-fraud they can't stop at all is human engineered click-fraud. 5 people using public library terminals to log in could create a very nice revenue stream. If the y automate and bring in botnets, the numbers could rocket.

    All of my own efforts indicated Google Adwords conversions to be lower than what I'd expected, and I always attributed a portion of that to fraud.
blog comments powered by Disqus

Subscribe

Receive a free daily email newsletter with new Publishing 2.0 posts


Recent Posts

Clicky Web Analytics