Lies, Damn Lies, and Logfiles

At the end of June, dbaron noticed a problem with the weblogs, which had been happening regularly since about February. The problem was that we were only rotating our logs daily, and the logfile was getting larger than 2GB. When that happened, the webserver just threw it away and started a new one. This means that Webalizer was dramatically under-reporting the amount of traffic.

You can see this clearly by looking at February’s daily usage stats (most of which are obviously truncated) or by comparing April’s hourly usage chart with July’s. The April one is clearly distorted, because most of the time, the logs from the early part of the day were getting thrown away.

So, what’s the real picture?

Well, the July stats are accurate. we currently do 30 million hits a day, consisting of about 90GB of data. For comparison, that’s about the same amount of traffic as Slashdot or ZDNet. And there’s only one direction our traffic numbers are going in…

4 thoughts on “Lies, Damn Lies, and Logfiles

  1. Eric: I don’t have the ability to reconfigure Webalizer. File a bug in the Server Ops component in Bugzilla.

  2. Robert: The webserver machine is called “rheet”. It’s the same spec as mecha, as far as I know (the Foundation bought three or four exactly the same). These figures are for webserver traffic served by rheet only.