Hololog: Holistic Web server logfile analysis.
Hololog is a holistic logsite analysis tool. That is, it is designed to give you an overview of who is using your Web site, and also to let you drill down to see individual browser sessions. You can learn how people are reaching your Web site, which pages they looked at, which pages are most popular, which are accounting for the most bandwidth, and more.
You can use Hololog to improve your Web site. If people are searching for something and finding one of your Web pages, consider adding links on that page to something that might help them more, or even adding more content in that area. If people are reaching the right index page but not finding the content, make the links to the content more prominent. If people look at some of your pages, then return again to a different page after going back to a search engine, consider adding a direct link between the pages.
In other words, by looking at people's behaviour you can make a Web site that's easier to use and more effective.
Hololog was originally written by Liam Quin over a period of several years. Several other people have contributed ideas, including Dr Ian Graham, Laurie Harper, Mark Loeser, and others.
Use the summary page to see who has been getting the most use out of your Web site. You will quickly get a feel for whether most people look at only one or two pages and then wander off.
There is a row for each Internet address from which your site was visited. This means that a large corporation with a firewall will probably only get a single entry. The columns shown in each row are as follows.
At the end of the summary is a total including internal hits - that is, files fetched from computers in the same Internet domain as the Web server. There is also a form so you can change the options.
The Summary view is generated by the status.cgi CGI script; it understands the following options:
This option joins together all hosts from each domain, so that, for example, dirk.holoweb.net and mail.holoweb.net both appear in a single entry as *.holoweb.net and their hits are counted together.
Set join-similar=yes in the URL to enable this.
This is the single day to include in the summary.
Set start-day, start-month and start-year, in the URI, e.g. start-day=22;start-month=Nov;start-year=2002;
If showall is set, these dates are ignored.
This is the most useful of all the reports. The CGI script nph-sortbyip.cgi produces this report, and can be configured extensively.
Each entry starts with the host name of a computer that contacted your Web server. There's then a count of the number of files downloaded and the total size in bytes, Kilobytes, Megabytes or Gigabytes.
If thumbnails of images were downloaded they are listed separately and not given separate entries later on. This reduces clutter in the summary.
After the heading line you'll see details of each file fetched from your server. Each file is (in Web terminology) a representation of a resource, but the partial URI logged (e.g. /~liam/) is whatever the Web server wrote in the log file.
The hits are numbered as they are read from the log file. Since some Web servers (including Apache) run multiple threads writing to the same log file, the entries are not always in the order you might expect, so the CGi script sorts them by date.
After the number is the URI that was fetched, after removing the http:// and the name of your Web server (simply to save clutter). This URI may be coloured differently depending on whether or not you've visited that page in this browser before (vlink configuration option).
After the URI comes a date, an HTTP status code such as 404 for file not found, and a byte count. Note that when the file is notfound the byte count is the size of your Error page.
Some entries are in a slightly different format, showing only Total fetch count in this period and not individual entries. These sumamry entries are intended for spiders such as Inktomi or Google that fetch every Web page on your site that they can find in order to add them to their indexes.
Note: The nph-sortbyip.cgi script is very configurable. In most cases the defaults should work, but you will need to change the file config.xml as per instructions in that file.
The following optins are accepted as CGI paraneters:
Set startdate to the first day you want to include in the report. For example, startdate=22/Nov/2005 would start the report with the first line in the log file matching that date. If there are no matching lines you won't get a report, even if the report starts after that date.
If you set enddate=23/Nov/2005 then the report will stop at the first line that matches this line. The default is for it to be unset, so that the entire log file is processed starting with the first match of startdate.
The file config.xml must be in the same directory as the CGI scripts. If you are running Mandrake Linux you may find that the default configuration works for you without changes.
The file is in XML, and must be well-formed. If you have the libxml package installed then you can use xmllint config.xml | wc to check that the file is OK. Do this whenever you edit the file by hand.
This version of Hololog does not include a GUI for editing the config - sorry. It's high on the todo list. For now, use your favourite text editor and check the file afterwards.
Here is a sample: