Blog Stats are Confusing - GETs, Views, User-Agents, Readers, Eyeballs
There's some discussion going on an internal MSFT mailing list about blog statistics. I don't check my web statistics more than once a month, as I'm more interested in blog comments or what's going on in the forum. If I get a lot of comments on a post I feel good. I like to get discussions going and bounce ideas back and forth.
That said, some blogs at Microsoft track their statistics and need to know if a particular post or new theme brings in more readers. One particular blog (not mine) recently saw a 16x increase in "hits" which is probably a good thing. A discussion started, and here's part of an email I wrote with my ideas that I thought you might find interesting, Dear Reader. I've made a few [edits] to make things clearer.
I think it's killer, to be clear, so in no way do I want to take away from [that blog's] most excellent work, but the web stats [in this case] specifically "smells" wrong. Possibly a bot, spammer, something, but still, a 16x increase in web traffic [in a single] month feels exceptional. It's the ratios [of GETs to projected humans] that are confusing to me.
It'd be interesting to use some heuristics to turn the RSS Feed HTTP GETs into Unique users. For example, most RSS Readers poll so one individual will hit your feed (in my experience) between 8 and 16 times a day, depending on their reader and how long their computer is on. Online readers are smarter that Smart Client readers like Outlook and FeedDemon. This usually means one has fewer readers than they think, if they are looking at GETs.
Additionally, online readers [usually] only hit once (here's how that works) [and rather] "tunnel" your subscriber numbers in the HTTP User Agent like "NewsGatorOnline/2.0+(http://www.newsgator.com;+250+subscribers)". Meaning, you might get one hit or 10 hit, but regardless they are representative of 250 individuals. This usually means one has more readers than they think, if they are looking at GETs.
Why do I mention this? I mention it because looking at HTTP GETs isn't representative of people, but of GETs. It took me a few years to figure this out, and I've been thrilled with the analysis work done by FeedBurner (my RSS Feed is hosted there, saving me over 400 gigs of bandwidth a month) to turn GETs into Humans.
Here's a real world example. FeedBurner says I have around 22,000 regular readers [as of today...it varies based on weekday/weekend]. That's aggregated across all News Readers:
My stats package shows about 50,000 page views a day or about 1.6 million a month. This varies, confirming [an earlier] comment about folks hanging around [a site] and reading stories, which is cool. However, if I look at "hits" I see 16.5 million. Of course, that's not [a useful stat], because that included images, css, etc. Visits, on the other hand are one individual hanging around for a period of time and reading. For example (these stats don't include RSS anywhere, including bandwidth):
Page Views - 1,596,548
Visits - 806,251
Hits - 16,500,422
Bandwidth (KB) - 209,759,564
For me, these stats make sense, because I have a readership of about 20,000 that show up every few days and hang out, representing [roughly] 50% of my traffic. The other 50% comes from Search Engines and [incoming] links from other blogs. So it's important that one distinguishes between hits, page views, and visitors, and tries to correlate those back to readership, IMHO.
The question that we need Blog Stats to answer is that of readership. What does [a] 600,000 RSS hits number mean? 600k/30days is about 20k hits a day, so how often are these readers hitting the feed per day? Once we come up with a standard-ish formula, blogs could get a rough +/-30% idea of how many human eyeballs [are actually reading].
Just my two cents, thoughts?
Related Posts
- RFC: How FeedReaders and MacGyver report blog subscribers - Tunneled User-Agent Data
- Parsing my IIS Log Files with LogParser 2.2 to learn more about Blogs stats from NewsGator and NewsGatorOnline
- Adding FeedBurner FeedFlare to DasBlog
- Syndicating ComputerZen
- Permanent Redirects with HTTP 301
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Because there are so many different blogging systems – you cannot accurately compare statistics.
I’m personally use Windows Live Spaces that has next to NO statistics. So while I know I’ve had close to 4 million page views – I have little clue as to who is reading me or the the number of RSS feeds etc.
Ignorance is bliss?
Blake Handler
Microsoft MVP
As for you, Scott, great analysis - I couldn't agree more. I love how FeedBurner helps give a much more accurate view of "readership", but even then it includes bots and such (typically falling under "other readers") so the number isn't entirely accurate.
'Course for little guys like me, you can bet I'm going to claim every one of those bots as a "reader" when I talk to my blogging friends! :)
On the up-side, if *everybody* used feedburner, we'd have a much better relative number to work with that could help you compare and contrast, albeit with a true absolute number still.
... and need to know if a particular post or new theme brings in more readers. One particular blog (not mine) recently saw a 16x increase in "hits" which is probably a good thing.
Do they have 16x more pictures with that new theme? q;-)
The gist of the article is to provide counts about the number of current subscribers, the number of new subscribers, the number of lost subscribers, and the number of regained subscribers. FeedBurner provides some of these stats, but they don't seem to measure them in the way that I would like them to be measured (it could just be that I am not familiar enough with the data yet).
Unfortunately I don't have any clever solution as to how to gather this information (Perhaps some variation of @Monsur's idea?).
Feedburners is great & the analytics are really nice. It makes it easier to subcribe for the user too, like when using google reader for example (my preferred reader.)
Thx 4 the info,
Catto
To me this is just another incarnation of what Web Developers have been dealing with since the early days, trying to take fairly abstract raw numbers and explain them in non-technical terms.
Anyway, that's my ramble (btw, can I have .1% of your traffic?) :)
But I think that is a testament to your knowledge and hard work that your readership is even more than 5. I have two kids, (both of which are about the same age as yours). I have a wife, a full time job, and I have barely enough time to write a good blog entry once a month... and by good, I mean more than 5 words. Keep it up Scott,
Is it possible to format the code to be google reader friendly . Looks like you have some nice <pre> tags on the webpage to format code , but when viewed as a feed , the code is almost unreadable .
This was an issue only in the older posts [Weekly source code 10 and earlier], If you have fixed it since then , thanks.
Great post. You didn't answer the most obvious question though. What is a hit? A hit is registered for every downloadable object on a page. If you have a page with 10 images and a JavaScript link and a CSS link (living on the same server), you'll have 13 hits for that one page view (one of the hits is the page itself).
So, in the early/mid 90s, the term hit was used all the time at first, but then the terminology changed and people started to use page views or visitors which better represented their actual traffic. As you mention, things have changed again with RSS readership. (in the 90s, some people were putting dozens or hundreds of tiny hidden images on their pages to bring their hit count up. . . obviously completely defeating the purpose, but it gave them bragging right for people wondering what their 'hit' count was). So, the term 'hit' is an old term that has always been unclear to most people.
Comments are closed.