Logalizer is a Perl script that has to be executed as a CGI, or directly
from the command line or as a cronjob for automatic periodic logfile analysis.
You must have installed perl and the module HTML::Template which you can get
from CPAN
The logalizer installation is not, by all means, a hard task. You simply copy
the entire directory tree of your logalizer distribution in a directory below
your Apache document root which has CGI scripts enabled (this is normally
something like /srv/www/cgi-bin or ~/public_html/cgi-bin)
For information on how to do this check your Apache documentation.
For automatic logfile analysis we also recommend to configure cron to call
logalizer in analysis mode periodically. The following entry in your crontab
would make logalizer read the latest log records every 2 hours:
You'll want to adjust this value to the traffic on your system, but you should
run it at least once a day, so logalizer doesn't miss records after the apache
logfile rotates.
The script used here is just a most simple shell script that could look similar to this:
Logalizer comes with a set of configuration files in the config directory of
your distribution. The only configuration file you need to worry about is
logalizer.conf. We will go step by step through logalizer.conf, but it's all
fairly straight forward, really.
DataDir = "/tmp/stats/data" |
This tells logalizer where to store temporary files created when the script is run.
This should normally be in a subdirectory of /tmp.
|
OutputDir = "/srv/www/htdocs/stats" |
This is where logalizer creates the HTML reports from your log files.
This directory has to be accessible, of course, by logalizer and by
yourself. CGI scripts are run with rather limited rights, thus the
OutputDir should have a sufficient umask. Usually 0755 will do. Note that
you should not create the OutputDir within you CGI directory.
Please make sure to copy the file
VisitTreePaneApplet.jar from the root directory
of your logalizer distribution in this
directory.
|
OutputURL = "http://localhost/stats" |
The URL to said OutputDir. If /srv/www/htdocs is your Apache document root on your system,
the OutputDir above can be accessed by this URL.
|
ScriptURL = "http://localhost/cgi-bin/logalizer/src/logalizer.cgi" |
The URL at which the logalizer script can be accessed. The same rules
as for OutputURL apply: if logalizer.cgi can be found at
/srv/www/cgi-bin/logalizer/src/logalizer.cgi, the URL given above
will work.
|
TemplateDir = "/srv/www/cgi-bin/logalizer/templates" |
The path to the template files, which logalizer uses to generate the web interface and the
HTML reports.
|
LogFile = "/srv/www/log/access.log" |
BackupLogFile = "/srv/www/log/access.log.0.gz"
The path to your Apache log file. If CGIs are run as the apache user you
should have no problems accessing this. Do note that logalizer remembers
where it stopped reading records in that file, so it will not create wrong
statistics if you run logalizer often on the same logfile. If run regularly
at least once a day, logalizer also notices when a logfile has been rotated,
and if you provide an entry for BackupLogFile it will get the records from
the logfile that has been just rotated (and possibly compressed) and renamed
to BackupLogFile. This way you will not lose any log records even if you have
no direct control over the log rotation process.
|
LogDir = "/srv/www/cgi-bin/logalizer/log" |
The path where logalizer will create its own log file. This has nothing
to do with your Apache log file. Logalizer will write errors, verbose
information and the like to this file. You can control how much
information the program produces by setting the verbosity level
(see below).
|
BaseURL = "http://localhost" |
The URL to your document root. This will normally be the URL given above
if you host the site on your own computer or the ordinary URL to your
site (e.g. http://www.mysite.net) if you have your site online. This is
used to be able to link to relative URLs in the statistics reports.
|
SelfReferer = "^\w+://(\w+\.)?localhost" |
This is a regular expression to filter out self referers. Don't worry
if you don't understand this, it's simply a pattern the script uses to
recognize referers from your own site. To set this correctly is important
if you want to get serious numbers on where your users are coming from.
|
VerboseLevel = 3 |
This tells logalizer how verbose it should be. 3 is a sensible value which tells the
program to emit warnings and more serious messages.
Valid values are:
- 0 - Everything beginning from trace messages
- 1 - Everything beginning from debug messages
- 2 - Everything beginning from info messages
- 3 - Everything beginning from warnings
- 4 - Everything beginning from errors
- 5 - Only fatal errors
|
LimitRecordsPerRun = 0 |
Here you can set how many records logalizer should process every time
the script is run. You will normally want to have 0 here to tell
logalizer not to limit records. Only on heavily loaded system you might
want to be a good neighbour and not hog the CPU by processing too many
records at once.
|
OffsetHours = 0 |
Here you can tell logalizer how many hours should be added to GMT time
before analysis. Most servers are using GMT as their system time in which
case you should leave it at 0.
|
VisitTimeout = 30 |
This line controls how a visit is defined. Due to the nature of websites
it is not possible to say when a user left the site and thus ended his visit.
logalizer tries to follow the traces of a user using his IP address as identifier.
However IP addresses are not really a unique identifier for a single person, but
it is good enough within a limited time interval. VisitTimeout defines after how
many minutes in which a previous user's IP has not been used on the analyzed website
is considered the start of a new visit. If you set this too short you will get many
"false positives" and see far higher visit numbers than you actually had. If you set
it too high you might coalesce two separate visits of one or more real people into
a single visit and will not see how popular your site actually is.
|
VisitNoiseLevel = 2 |
With this setting you can control the granularity of the visit paths analysis.
|