Like most website owners/operators, I'd like to think that my website content is attracting an audience that find the material informative or enjoyable in some way. I'd also like to think that there is some chance of that audience growing over time as well, not because I'm trying to make a dollar (it's actually costing me money) but, I guess, simply because it's nice to feel appreciated. These motivations are hardly unusual and, of course, for some website owners the aim is in fact to support or encourage actual enterprise that they and others might rely on to make a living.
Whatever the motivation, there is a desire, even a need, to quantify numbers and types of visitors to the website. How else can we know if we are presenting content which is truly of interest to others or presenting it in a manner which is satisfying, accessible or useful to them? Enter Google Analytics.
There are other analytics packages available but Google Analytics is surely the 800 pound gorilla in the industry. There are several reasons for this. Firstly Google is the dominant player in the field of internet search, indeed "google" is no longer just a noun but a verb as well. Secondly, it's free to use. And thirdly it offers quite substantial insights into web user behaviour, or at least that's the claim! If and when such analytics services perform as claimed, then they are a real help in making decisions regarding website design, content, functionality, marketing, etc. All of this however is reliant on the data being genuine.
Number of users, page views, time spent of site, new vs returning visitors, useful information and just the start of what Google Analytics promises.
How does it work?
It's worth pointing out at this stage that the "analytics" provided are a record of what's been received on Google's server, not the website server. If this was what I'll call a "closed system", in other words, there were no other influences, then things would be just fine. But.....it's not a "closed system"!
Some time ago Google announced its "Measurement Protocol" which allows for a server to communicate directly with Google Analytics via an official protocol to help facilitate phone tracking solutions, Point of Sale systems, CRM (Customer relationship management) systems, etc. The problem with this is that there is no authentication requirement, with the result that the system is open to abuse. In practice, virtually anybody can communicate with Google Analytics servers.
Enter the Spammers.
Where ever there is a weakness in a system there will be people who'll try to exploit that weakness to their own ends. By knowing your Google Analytics code number, a spammer can send bogus information to Google's server. Indeed the spammer doesn't actually need to know a specific number as most of them follow a particular format (UA-???????-??). A spammer can use the known format and fill in the blanks at random knowing that if enough combinations are attempted he's sure to get a few right. If this is done in an automated way, then thousands or even millions of numbers can quickly be generated and fed into the system.
What do spammers get out of this? Well there are millions of Google Analytics users who when they see a "referrer" apparently sending web traffic to their sites are curious to find out more about the referrer. The bogus referrer's web address is displayed and hence draws interest, web traffic, to their own web sites. These sites will in many cases be rubbish shopping sites offering fake Rolex watches, counterfeit Louis Vuitton handbags, plain wrapped Viagra, etc, and quite possibly malicious coding/viruses as well.
As can be seen in the example above, virtually all the parameters can be spoofed, bounce rates, pages visited, time on site and so on. Of the 217 supposed referrals during the period only a handful were legitimate.
What can the GA account holder do about it?
There are 2 methods that have been advocated via the internet neither of which is really effective. The first approach is to block referrers from accessing your site at all by amending your .htaccess file in the root directory of your domain. Sadly this doesn't help as it assumes the referrers are hitting your web server and hence can be locked out. As shown above however, the referral spammer isn't accessing your site or server to start with, the bogus "hit" is on the Google Analytics server.
The second approach is to use filtering through Google Analytics. Once the referral spam is observed, it is a simple procedure to filter that referrer out of any future analytics. Here is an example:
Sadly there are 2 problems with this approach. Firstly the filters are not retrospective. If you find that spammer XYZ sent you 100 "hits" on a given date and then filter him out, those 100 "hits" still show up as legitimate because they weren't filtered at the time. The second problem is that spammers are changing their names, referrer addresses, all the time. You may have filtered out XYZ today but the same clown will be calling himself ABC tomorrow. You can find that you're adding new filters virtually every day and getting nowhere like a dog chasing its own tail.
In the example above it appears I've done well, I've filtered out 31 bogus referrals! Truth is that it was a couple days since I'd updated the filter list and in that time numerous other spammers or "ghost referrers" had popped up. Sadly, genuine referral traffic would be but a small fraction of what's indicated.
What about "direct" traffic?
Just as referral traffic can be spoofed, it is now obvious that so called "direct" traffic stats can be falsified as well. In the example below we can see the unrealistic referral numbers but now look at those "Direct" traffic numbers. This number has jumped significantly in a short time, sadly, drilling down into the numbers shows most of these "hits" to be nonsense as well.
Investigating the 121 "Direct" hits shows that almost all are bogus, below:
Keep drilling and we find that a huge proportion of those 121 "hits" were from one obscure service provider:
So what can be done?
Well from the end user's position, not a lot. Google are clearly aware of the issue, and have been for several years, but appear either unable or unwilling to fix their system. Some commentators question whether a long term solution is even possible given the fundamental structure of Google's Analytics where there is no authentication required to send data to their servers.
There are other analytics systems available however, from what I've seen, they don't appear to offer a great deal of information or they are paid (in some cases expensive) products.
If yours is a website with a huge following then it may be worth persisting with Google Analytics and recognizing the issue. If you or your organization's web site gets 100,000 hits a month then you may well accept 1000-2000 bogus figures as unlikely to heavily skew the results. However if yours is a small web site the statistics can become so skewed as to be completely meaningless even if your do invest (waste) a great deal of your time trying to battle it. For me, I now only check my Google Analytics when I need a good laugh and debate whether I should remove it entirely from my site. It may well be time to dump Google Analytics! ~KD
blog.analytics-toolkit.com: "Is a long-term solution even possible?"