Dissecting Log Files Print E-mail
Web Analytics
Written by Lyris HQ Staff Writer   
Friday, 02 May 2008
Dissecting Log FilesPut away your magnifying glass and scalpel - all you'll need for dissecting your web server log files is your computer, a text editor and a little bit of analytical thought. In this article, we're going to take a look at the components of a web server log and discuss how analytics packages use these fields to provide meaningful data to you. We'll be concentrating on two of the most popular web servers: Microsoft's Internet Information Server (a.k.a. IIS) and the open source web server Apache.

Both of these web servers enable data to be served up to a user through an Internet browser. As a user browses a web site, most of their actions are logged to a file which is kept on the web server. These logs can then be fed into analytics packages like ClickTracks for analysis.

Web Log Fields


A web server doesn't discriminate—it logs field information whether your analytics package needs them or not. Let's take a look at the most common log file fields and explore what each field is used for.

  • Date and Time
  • Client IP Address
  • HTTP Method
  • Requested file and Query string
  • User Agent
  • Referrer
  • Status code
  • Cookie (preferable, but not required)
  • Virtual server name (required only for multi-domain logs)


First we'll dissect the IIS web server log file since its layout is the simplest.

I've loaded a sample IIS log into a simple text editor (something like Notepad or Wordpad would both work well). The image below is what you can expect to see in an IIS log file. IIS files are a bit easier to read—since they typically provides a header, all you have to do is line up the header column with its corresponding value.


IIS Log Format









                                       Click image to view full size with nomenclature

Now let's look at the slightly more complicated Apache log file format.
Apache log files are a bit trickier to parse, because there's no header line in the file. Compare your own Apache log to the diagram below to get an idea of what's what.


Apache Log Format








                                       Click image to view full size with nomenclature

Once you have a clear understanding of what fields your web server is logging and what those field results look like in IIS and Apache, let's get to the real question: What does it all mean? Let's examine the most important log file fields, one by one.

Date and Time


This is the field that stores the date and time a particular object (like an image) was requested. This field is crucial to building a visitor session and finding metrics like Time on Site.

Client IP Address


This is the IP address of the machine that accessed your web site. Although IP addresses aren't necessarily unique to any one visitor (as most visitors surf the web via a dynamic IP address provided by their ISP and not their own dedicated static IP and pipe), the IP address can still be useful in partitioning the log file into visitor sessions.

You may also notice a Server IP field in your log files. This is the field that logs the IP address of the web server machine that served a particular web site. This field is interchangeable with the Virtual Host field (see below for definition of virtual host).

HTTP Method


This is the field that stores the way that the web site was accessed. There are several possible values for the HTTP Method, but the two most common are GET and POST. This field isn't logged as a separate field in some flavors of web servers. For example, IIS logs it as a separate field but Apache combines the request and the HTTP Method into one field.

Requested file and query string: This is the field that logs the object (file) that's being requested. In many cases, the object doesn't get requested by itself—in fact, it's very common for an object to be requested with query parameters appended to it. These parameters are logged as a part of the Query String.

Once again, depending on your particular web server, these two fields can be separate or together. For example, in IIS these two fields are logged separately and in Apache they're logged together in one field.

The Query String is a very important (and sometimes mandatory) parameter. Not only is this the field that stores the URL parameters on your site, making it possible to track dynamic sites, but it also stores the tracking parameters that you use in your PPC landing URLs. Tracking parameters are crucial to distinguishing between organic vs. PPC traffic.

User Agent


The user agent is also known as the client signature�and nope, this isn't the visitor's John Hancock! This is the field that logs the browser signature of the client that accesses a web site. For example, Netscape and Firefox browsers will have the string "Mozilla" in their User Agents. Internet Explorer browsers will have the string "IE" in their User Agents. Robots and spiders also have their own user agent signatures: Google's spider will have the string "googlebot" in its signature.

Referrer


This is the field that logs the web page from which the visitor arrived. The referrer field can show you search engines, affiliates and even advertisements. But that's not all—we also can discover the keyword that was used when the visitor searched in a search engine and came across your site.

The referrer field is also very important in building a visitor session. It's almost impossible to build an accurate visitor session if we don't know where the visitor came from�the referrer field lets you, in essence, 'follow' a visitor from one page to the next.

Status code


This is the field that stores whether the requested action was successful or not. There are several possible values for this field; here are a few of the more common:

  • 200 level = Status ok. Action completed successfully
  • 300 level = Upon requesting a particular file, the visitor is redirected to request another file
  • 400 level = File not found error


Cookie (preferable, but not required)


This field is optional but very beneficial. If you place cookies on your visitors' machines, the cookies will be logged in this field, making them available to be used in your reporting. The presence of a persistent cookie lets you accurately track unique and return visitors. Persistent cookies can also help in tracking your latent conversions from your ads. Plus, with the introduction of cookies, you're now able to store additional information that isn't available through standard web server logs, and then report on this information later. The moral of the story? Just say yes to cookies.

Virtual server name (required only for multi-domain logs)


Typically, each web site will have its own set of log files. A multi-domain log file is a log file where the requests for multiple web sites are logged to one log file. So, in this case, there has to be something in every log file line that ties it to a particular web site. The Virtual Host field does this, and makes it possible to accurately filter the log file entries by web site.

What if one of my fields is missing? How do I make all this happen?


After reading this article, you may notice that some of the fields we described aren't showing up in your log files. In that case, you just need to turn the fields 'on' by making changes to your web server. Depending on your setup, you may need to contact your hosting company to get this done or you can make changes yourself if you have control of the server.

Apache: Look for a file called httpd.conf. Open this file for editing and look for the section on logging. You'll notice a logging string that corresponds to the format you see in the log file. If you use the following string, it should record and report on all the important fields.

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{cookie}i\"" combined

Then save the file, and restart Apache.

IIS: Tweaks to IIS log file formats need to be made in the Internet Services Manager which is typically found in your Windows Control Panel ' Administrative Tools.

Right click on the web site in question, and select properties. Then move to the section on logging. You can simply check or uncheck the fields that you need. Then, just like with Apache, restart the IIS service to apply the changes.

Happy tracking!

### 

Related Resources:
Comments (1)Add Comment
...
written by pitt56, May 19, 2009
Unfortunately it is true that firefox 3 final is slower and even worse than 2.0.14. I used two copies of firefox 2.0.14 http://rapid4me.com/?q=firefox 2.0.14 on two computers, one is XP sp3, one is Vista sp1, both of them have plenty of unused memory.
report abuse
vote down
vote up
Votes: +0

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smaller | bigger

busy
 
< Prev   Next >

Lyris HQ Client Login

Flash Player Required

Lyris HQ requires the most recent version of the Adobe Flash Player, a free browser plug-in.

Get Adobe Flash Player.

Get Adobe Flash Player

Advertisement
 
Email Marketing & Internet Marketing Tools - Lyris HQ
Maximize your marketing spend. Lyris HQ brings together email marketing, deliverability tools, content creation, Web analytics, search marketing and mobile marketing. Execute campaigns and review ROI performance from one integrated solution. That's the unbeatable power of Lyris HQ.
Join conversations and make connections at: