Online website statistics and advertising statistics have are much more accurate than statistics for print and TV advertising, but they still have problems.


Let me just start by saying that although I am talking about the inaccuracies of online advertising and website statistics, let there be no confusion, in even the worst situation online statistics are much more accurate than TV viewer ship and print readership statistics. As a matter of fact while TV and print rely on surveys, calculations and estimates to come up with what they call statistics, online statistics actually use counts that rely on counting events at either the server side or at the browser side.


So here is the real problem, the counting on the server side and that on the browser side nearly never match although they should. The sad issue is that the Internet is such a complex system, wonderful but complex to a limit most people rarely understand ( well if everybody did we would be out of business ). The first thing is to understand that the Internet as a system includes not just the ISP that provides you with the connectivity, but also includes the servers of all the sites you visit and even your computer ( and your other Internet connected devices ) and the software on it that communicates with the Internet.


Now that we have a more comprehensive view of what the Internet is, we can better see where the problem with the numbers come. Now if you count visitors to your site from the server it means for each page that get requested from the server the server can count it as a page view. Now take this, what about when google comes to visit your site to index it? Your server sees a page view but this page view is useless if you are counting people or advertising statistics. Luckily most online statistics systems will filter out such search engine and automated visits. But what these systems can rarely account for are many issues that relate to the fact that the Internet is a free and unregulated world. For example in many places where bandwidth is expensive Internet service providers put large servers that cache web sites. What this means is that when you request a page, your ISP keeps a copy of it for a set period of time and if another user in the same ISP requests this same page they will give him the saved copy, this saves them international bandwidth and reduces the load on their systems in general. Now the problem is that now the server of your site that is counting who saw your site or your advertising has seen only one visitor while tens of people from this ISP have requested the same page and the got it from the ISPs cache not from the server that does the counting.


To solve this many systems moved to counting page views and ads on the browser side. They do this by putting a small piece of code called javascript on their pages. Now this code does one important thing, when the browser loads the page, this code communicates with the server and tells it that the page was viewed. See this can not be cached and since search engines do not execute javascript, visits from google to your web page will not be counted. Perfect! Not really. See there is a problem here, the ability to run javascript is a setting on your browser and you can turn it off, some people do that and that makes true visits more than counted visits. Also many programs are available to keep javascript but disable advertising and its counting on your browser. The bigger problem is that many Internet enabled devices and mobiles have limited javascript ability and thus may execute or incorrectly execute these counts. Luckily the margin of error is gradually being reduced as mobile phones and tablets are getting closer to the desktop browsers in terms of JavaScript ability. But in reality this improvement may be short lived. The Internet is evolving at such a rapid pace, that really nothing is guaranteed.


To sum up let me put all this in perspective, the over all error in all these systems is minimal in comparison to other media such print and TV where statistics should be called estimates. Also remember you should not panic when two statistics systems give you different numbers. If you know how they calculate their numbers you will know in part why they are different.

All this is really a part of the true wisdom that perfection is not attainable in this world only God is perfect all others have errors.


This may not be important to many business class sites, but for online magazines, portals and web sites that advertise online, it is important that they choose a web design and services company that understands such issues.

Just a quick example, we as a company are managing on online site that has hundreds of thousands of pages. We where perplexed at why the server load on this site's server was much higher than what is expected from the statistics that showed us the number of visitors to the site. See this statistics system was browser based counting, and thus it did not take into consideration search engines visiting the site. Normally this is not an issue, but since this site has hundreds of thousands of pages and all of them are dynamic and change with the latest titles of the site, google sees all this as content that needs to be re-indexed for freshness. And so google tries to index the whole site once every few days, with a few hundred thousand pages, google was visiting nearly one hundred thousand pages per day, which is nearly as much as online visitors view per day on this site. Thus the traffic was double what we thought it was. If we had counted at the server we would have caught this earlier.




Now how do we reduce the load on this server to a better value, well we actually did that, but we are not telling, some things you learn the hard way, that is what experience is all about.

