Parsing my IIS Log Files with LogParser 2.2 to learn more about Blogs stats from NewsGator and NewsGatorOnline
LogParser is pretty slick. It's wicked hard to learn, in my opinion as I'm not very SQL-y, but it's still awesome. Here's the marketecture description:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory.
There's a unofficial support site at LogParser.com which makes me respect the tools usefulness even more. Not even unsupported Microsoft tool has it's own unsupported support site supported by MikeG.
It's not only cool because it's a super-psycho command-line tool, and we all know how I love those. But, it's also got a COM Interface so it programmable/scriptable as well.
So, long story short, I noticed recently that GregR over at NewsGator said that he's including subscriber statistics in the NewsGatorOnline User-Agent. Now, I'm not sure what I think of tunnelling information like that in such an odd place, but what do I know? The idea is clever as hell.
I decided this was a good time to learn how to use LogParser. I grabbed a 20-Meg IIS log file from my ISP from a random day last week and setup some queries. Here's what I did in about 5 minutes.
C:\Program Files\Log Parser 2.2>LogParser.exe -i:IISW3C "SELECT COUNT(cs(User-Agent)) AS Client FROM c:\temp\ex050120.log WHERE cs(User-Agent) LIKE 'NewsGatorOnline%' "
Client
------
134Statistics:
-----------
Elements processed: 90386
Elements output: 1
Execution time: 3.71 seconds
So, if I'm understanding this correctly, NewsGatorOnline's Aggregator hit me 134 times this particular day. What's my number of NewsGatorOnline Subscribers?
C:\Program Files\Log Parser 2.2
>LogParser.exe -i:IISW3C "SELECT DISTINCT cs(User-Agent) AS Client FROM c:\temp\ex050120.log WHERE cs(User-Agent) LIKE 'NewsGatorOnline%' "
Client
---------------------------------------------------------------
NewsGatorOnline/2.0+(http://www.newsgator.com;+2+subscribers)
NewsGatorOnline/2.0+(http://www.newsgator.com;+1+subscribers)
NewsGatorOnline/2.0+(http://www.newsgator.com;+250+subscribers)
NewsGatorOnline/2.0+(http://www.newsgator.com;+15+subscribers)
NewsGatorOnline/2.0+(http://www.newsgator.com;+249+subscribers)Statistics:
-----------
Elements processed: 90386
Elements output: 5
Execution time: 3.67 seconds
Now, I don't know if I'm doing this right, but I'm not sure what I'm being told here. Either I'm doing this query wrong or Greg's stats are off. I'd expect the number of subscribers to be constant or change +/- a few a day.
UPDATE: I realized this on the way home and GregR emailed me to remind me, each distinct RSS Url is tracked by NewsGator. So my ATOM Feed and my Category Specific Feeds each have their own statistics.
This is just NewsGatorOnline folks. I also run queries on actual NewsGator Add-In users for both version 1.0 and 2.0. (I'm sure there's a way to do it all in one query, but again I'm not SQL-y.
C:\Program Files\Log Parser 2.2>LogParser.exe -i:IISW3C "SELECT DISTINCT COUNT(c-ip) AS Client FROM c:\temp\ex050120.log WHERE cs(User-Agent) LIKE 'NewsGator/1%' "
Client
------
25C:\Program Files\Log Parser 2.2>LogParser.exe -i:IISW3C "SELECT DISTINCT COUNT(c-ip) AS Client FROM c:\temp\ex050120.log WHERE cs(User-Agent) LIKE 'NewsGator/2%' "
Client
------
4874
I thought this means I have around 25 distinct IPs using NewsGator 1.x who hit my site that day and about 4874 using NewsGator 2.0. But, that seems like WAY too many people subscribed to my site with just NewsGator. So, I changed the query:
C:\Program Files\Log Parser 2.2>LogParser.exe -i:IISW3C "SELECT COUNT(DISTINCT c-ip) AS Client FROM c:\temp\ex050120.log WHERE cs(User-Agent) LIKE 'NewsGator/2%' "
Client
------
385
THAT'S more like it. 385 distinct IPs using NewsGator 2.x hit my site this particular day. As an aside, there are actually 2 people using NewsGator 1.x. Cool. I need to spend more time understanding this tool. I believe I can ask it more complicated questions about my site than something like LiveStats, and I could set it up to email me charts or stats, or create an ASPX page for blog-specific reports.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Brad Abrams posted this recently...
1. More than 120M copies of the .NET Framework were downloaded and installed (using either Microsoft downloads or Windows Update)
2. More than 85% of new consumer PCs sold in 2004 had the framework installed. More than 58% of business PC had the .NET Framework preinstalled or preloaded
3. Every new HP consumer imaging device (printer/scanner/camera) comes with managed application; Every new Microsoft Intellipoint mouse software CD ships with the .NET Framework
4. Between Q303 and Q204 there was more than 30% increase in the number of .NET Framework installations in the business segment.
5. Most of the installations of the .NET Framework are in Windows XP and Windows 2000 machines.
6. More than 95% of the people that run Microsoft Internet Explorer have version 5 or newer.
7. Windows 98 and Windows ME are less than third of the consumer market PCs
Do you have thoughts on what indicates .NET Framework installation trends?
1) 120M. Yet what % that is comparing to world-wide installed base of Windows machines? If I roughly prorate US market as 1/3 of total (US+EU+Asia) that means 40M copies installed in US. From 192M US internet users, 40M will make 21%.
2) Few years ago I read on shareware site that any extra MB of download cuts your download numbers by half. quite a concern to bundle any downloadable app with 23Mb .NET runtime.
3) HP and MSFT products bundling with run-time is very good. I think one of points Joel missed is that every new copy of Office and other main Microsoft products comes with run-time. Hopefully that push makes .NET 1.x higher then 20%. However his point are super valid for say .NET 2.0. Anybody who would develop rich client on 2.0 without ability to ride an elephant (i.e. OEM) to pre-installs runtime is commercial suicide.
4) I heard XP penetration rate is way slower then desired. Some ISV posts state XP is not even a blip on their download stat radars. Therefore .NET bundling with XP is not seeding the market that much.
5) Asia (which is big focus for us as online gaming company). Predominantly, hugely Win98 land. May be Win98 is 30% of world market, but so far it looks like 80% in Asia. Also worth noting that .NET *attempts* to work on Win98, but its not really working there. Beside runtime issue here we have partial platform compliance.
6) Browser hole. SP2 effectively killed ActiveX as means to put rich content in browsers - and for good reasons. However what is replacement? Logically it should be .NET Components. Yet I don’t see that world at large is turning to .NET Components at all yet. And obvious that not going to happen until runtime penetration reaches 90%+. What the point of doing rich website if people need 23Mb download even to see it? Obvious winner so far – Flash.
7) As it looks right now Joel is still right on target. Develop rich .NET client – limit yourself to tiny market segment right from the start. Go for the web and you can have hundreds of millions more customers.
To certain degree this is all speculation. The best way to really test is to do that data mining of some popular consumer site and see what real % of browsers have .NET headers.
What keeps concerning me after Joel post is why MSFT never published any stats or research on current world-wide .NET 1.x install base numbers. I bet they must have them internally (just mine microsoft.com logs). Yet if they are silent after all this years that makes me wonder….
Comments are closed.
logparser -i:IISW3C "SELECT SUBSTR(cs(User-Agent), 0, INDEX_OF(cs(User-Agent), '/')) as Agent, TO_INT(SUBSTR(cs(User-Agent), ADD(INDEX_OF(cs(User-Agent), ';+'), 2), INDEX_OF(SUBSTR(cs(User-Agent), ADD(INDEX_OF(cs(User-Agent), ';+'), 2)), '+'))) as Subscribers, MIN(TO_TIMESTAMP(date, time)) as FromDate from d:\logs\*.log WHERE cs(User-Agent) LIKE '%subscribers%)' GROUP BY cs(User-Agent) ORDER BY Agent, FromDate desc"