banner

Logalizer installation instructions

Introduction

Logalizer is a Perl script that has to be executed as a CGI, or directly from the command line or as a cronjob for automatic periodic logfile analysis. You must have installed perl and the module HTML::Template which you can get from CPAN

Installation

The logalizer installation is not, by all means, a hard task. You simply copy the entire directory tree of your logalizer distribution in a directory below your Apache document root which has CGI scripts enabled (this is normally something like /srv/www/cgi-bin or ~/public_html/cgi-bin) For information on how to do this check your Apache documentation.

For automatic logfile analysis we also recommend to configure cron to call logalizer in analysis mode periodically. The following entry in your crontab would make logalizer read the latest log records every 2 hours:

# weblog analyzer
0 */2 * * * apache:apache /usr/local/bin/logalizer. sh

You'll want to adjust this value to the traffic on your system, but you should run it at least once a day, so logalizer doesn't miss records after the apache logfile rotates.

The script used here is just a most simple shell script that could look similar to this:

#!/bin/sh
LOGALIZERDIR=/srv/www/cgi-bin/logalizer
LOGFILE=/var/log/logalizer-cron.log
cd $LOGALIZERDIR
./logalizer.cgi cmd=Update >> $LOGFILE 2>&1

Configuration

Logalizer comes with a set of configuration files in the config directory of your distribution. The only configuration file you need to worry about is logalizer.conf. We will go step by step through logalizer.conf, but it's all fairly straight forward, really.

General information

  • Comments in the configuration file begin with a hash sign (#)
  • Whitespace is ignored
  • The general form of a config entry looks like
    • Key = "Value"
  • Individual entries are delimited by a newline
  • We give standard values for the entries here which you will have to change to the correct paths on your system.

Configuration instructions

DataDir = "/tmp/stats/data"
This tells logalizer where to store temporary files created when the script is run. This should normally be in a subdirectory of /tmp.
OutputDir = "/srv/www/htdocs/stats"
This is where logalizer creates the HTML reports from your log files. This directory has to be accessible, of course, by logalizer and by yourself. CGI scripts are run with rather limited rights, thus the OutputDir should have a sufficient umask. Usually 0755 will do. Note that you should not create the OutputDir within you CGI directory.
Please make sure to copy the file VisitTreePaneApplet.jar from the root directory of your logalizer distribution in this directory.
OutputURL = "http://localhost/stats"
The URL to said OutputDir. If /srv/www/htdocs is your Apache document root on your system, the OutputDir above can be accessed by this URL.
ScriptURL = "http://localhost/cgi-bin/logalizer/src/logalizer.cgi"
The URL at which the logalizer script can be accessed. The same rules as for OutputURL apply: if logalizer.cgi can be found at /srv/www/cgi-bin/logalizer/src/logalizer.cgi, the URL given above will work.
TemplateDir = "/srv/www/cgi-bin/logalizer/templates"
The path to the template files, which logalizer uses to generate the web interface and the HTML reports.
LogFile = "/srv/www/log/access.log"
BackupLogFile = "/srv/www/log/access.log.0.gz"
The path to your Apache log file. If CGIs are run as the apache user you should have no problems accessing this. Do note that logalizer remembers where it stopped reading records in that file, so it will not create wrong statistics if you run logalizer often on the same logfile. If run regularly at least once a day, logalizer also notices when a logfile has been rotated, and if you provide an entry for BackupLogFile it will get the records from the logfile that has been just rotated (and possibly compressed) and renamed to BackupLogFile. This way you will not lose any log records even if you have no direct control over the log rotation process.
LogDir = "/srv/www/cgi-bin/logalizer/log"
The path where logalizer will create its own log file. This has nothing to do with your Apache log file. Logalizer will write errors, verbose information and the like to this file. You can control how much information the program produces by setting the verbosity level (see below).
BaseURL = "http://localhost"
The URL to your document root. This will normally be the URL given above if you host the site on your own computer or the ordinary URL to your site (e.g. http://www.mysite.net) if you have your site online. This is used to be able to link to relative URLs in the statistics reports.
SelfReferer = "^\w+://(\w+\.)?localhost"
This is a regular expression to filter out self referers. Don't worry if you don't understand this, it's simply a pattern the script uses to recognize referers from your own site. To set this correctly is important if you want to get serious numbers on where your users are coming from.
VerboseLevel = 3
This tells logalizer how verbose it should be. 3 is a sensible value which tells the program to emit warnings and more serious messages.
Valid values are:
  • 0 - Everything beginning from trace messages
  • 1 - Everything beginning from debug messages
  • 2 - Everything beginning from info messages
  • 3 - Everything beginning from warnings
  • 4 - Everything beginning from errors
  • 5 - Only fatal errors
LimitRecordsPerRun = 0
Here you can set how many records logalizer should process every time the script is run. You will normally want to have 0 here to tell logalizer not to limit records. Only on heavily loaded system you might want to be a good neighbour and not hog the CPU by processing too many records at once.
OffsetHours = 0
Here you can tell logalizer how many hours should be added to GMT time before analysis. Most servers are using GMT as their system time in which case you should leave it at 0.
VisitTimeout = 30
This line controls how a visit is defined. Due to the nature of websites it is not possible to say when a user left the site and thus ended his visit. logalizer tries to follow the traces of a user using his IP address as identifier. However IP addresses are not really a unique identifier for a single person, but it is good enough within a limited time interval. VisitTimeout defines after how many minutes in which a previous user's IP has not been used on the analyzed website is considered the start of a new visit. If you set this too short you will get many "false positives" and see far higher visit numbers than you actually had. If you set it too high you might coalesce two separate visits of one or more real people into a single visit and will not see how popular your site actually is.
VisitNoiseLevel = 2
With this setting you can control the granularity of the visit paths analysis.
Valid XHTML 1.0! Valid CSS! SourceForge.net Logo