Manual

The default options to configure will install IRStats into /usr/local. For testing purposes it's recommended that you work in /usr/local. These instructions refer to the locations as if you had run configure as:

./configure --prefix=/

If you didn't install IRStats into the root of your system you will need to prefix /usr/local/irstats to any paths referred to here.

Contents

  1. Configuration
    1. irstats.cgi
    2. irstats.conf
      1. Available options
    3. Configuring Apache to use IRStats
  2. Running IRStats
    1. Importing Log File Data
      1. EPrints Accesslog Tables
      2. EPrints Apache Log Files
      3. DSpace Apache Log Files
    2. Looking up IP Addresses
    3. Exporting Metadata
      1. Quick Start
      2. EPrints 2/3
    4. Importing Metadata
  3. Updating the IRStats Database

Configuration

To configure IRStats you should only need to modify two files - the paths in the main executable /var/www/cgi-bin/irstats.cgi and the configuration file /etc/irstats.conf.

irstats.cgi

You may need to modify irstats.cgi depending on where you installed ChartDirector/awstats. If any of these paths are incorrect you will get a Perl error on running irstats.cgi saying that a module could not be found in the Perl path.

#!/usr/bin/perl -w

##############################################################################
### Configuration ###
##############################################################################

# IRStat's perl modules
use lib "/usr/local/irstats/var/www/irstats/lib";

# awstats modules
use lib "/var/www/awstats/lib";

# ChartDirector
use lib "/usr/local/irstats/usr/lib/ChartDirector";

# EPrints (if using EPrints)
use lib "/opt/eprints2/perl_lib";

use IRStats;

# The path to IRStat's configuration file
$IRStats::Configuration::FILE = "/usr/local/irstats/etc/irstats.conf";

irstats.conf

IRStats' behaviour is controlled by its configuration file. Options are specified by an option name/value pair:

#Set configuration for irstats with this file

# the repository type (used by update_table)
# one of eprints2, eprints3, dspace, apacheeprints
repository_type = eprints3
# repository_type = eprints2

# the eprints repository to extract from
repository = soton

# sets to allow filtering by
set_ids = internal_group, subjects, creators

Multiple values can be specified by providing a comma-separated list:

name = value1, value2, value3

Values are unquoted (don't put quote marks around values).

Available options

all_dashboard
Views to show on the dashboard for all eprints. You need to define a list of Views for every set_id by specifying the set_id followed by _dashboard.
cache_path
The directory used to store temporary cache files (must be writeable by the Web server).
database_column_table_prefix
The prefix to use for column tables.
database_driver
The database driver to use - must be mysql for MySQL or Pg for Postgres.
database_eprints_access_log_table
The name of the EPrints access log table to read from.
database_id_columns
The database columns to map to auxiliary tables (more efficient storage) - probably want to leave as requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id.
database_main_stats_table
The main table to use for IRStats.
database_name
The database name to use.
database_password
Password to connect to the database with.
database_port
Port to connect to (optional).
database_server
The database host name (localhost for local).
database_set_table_citation_suffix
Suffix to append to set citation tables.
database_set_table_code_suffix
Suffix to append to set code tables.
database_set_table_prefix
The prefix to use for set tables.
database_user
User name to connect to the database with.
database_table_prefix
The prefix to use for IRStats tables.
dns_cache_file
The file to store the DNS lookup cache in.
dspace_handle
The major part of the DSpace handle (handle/XXXX/1223).
geo_ip_country_file
The Geo IP country database file.
geo_ip_org_file
The Geo IP organisation database file.
id_parameters
The parameters that are used to uniquely identify a view.
max_cache_age
Maximum time to keep a cache file for in seconds. referrer_scope_{1,2,3,4,no_referrer}:: Labels for the different referrer scopes.
repeats_filter_file
The file to store the repeated requests in.
repeats_filter_timeout
The minimum time that must elapse between requests for the same eprint from the same IP address in seconds.
repository
The repository name.
repository_type
The repository software (controls where IRStats reads usage data from). One of 'eprints2', 'eprints3', 'dspace' or 'apacheeprints'.
repository_url
The base URL of the repository.
root
The root of the IRStats directory - this value gets prefixed to any path value that is relative.
set_ids
A list of fields that eprints can be grouped by. 'eprint' is implicitly a member of set_ids and does not need to be specified. For GNU EPrints simply specify the field name to use.

To use one field for the set names and another for the set value (likely in Compound fields) append "_id_field" to the field name and give a value of the field to actually use for the set e.g. if you have a compound field of creators_name and a identity field of creators_id provide an option of creators_name_id_field = creators_id.

set_member_full_citations_file
The CSV file containing full citations for members of set_ids.
set_member_short_citations_file
The CSV file containing short citations for members of set_ids.
set_member_codes_file
The CSV file containing code mappings - maps from a user specified string into the internally used identifier.
set_member_urls_file
The CSV file containing URLs for members of set_ids.
set_membership_file
The CSV file containing set memberships.
set_phrases_file
The CSV file containing phrases for various IRStats things.
static_path
The directory used to store IRStats statically served fields (the style sheets and thumbnails). The /graph directory below this must be writeable by the Web server.
static_url
The relative URL of the static_path directory. Leave blank if IRStats is not installed in a sub-directory.
update_lock_filename
Filename to use for preventing multiple instances of the IRStats process running.
view_path
The directory containing the IRStats View modules.

Configuring Apache to use IRStats

IRStats requires it's cgi script (irstats.cgi) and it's static files be made available from the Web server.

On most systems the CGI script can be made available by copying it to /var/www/cgi-bin, in which case it can be access from http://yourhost/cgi-bin/irstats.cgi. By default IRStats expects to find its static files at http://yourhost/irstats/, which can be achieved by copying the static files to /var/www/html.

If you want to locate IRStats somewhere else please consult your Web servers documentation.

# Example using mod_perl and the PerlSetVar to serve multiple
# repositories from a single CGI script
<VirtualHost *:80>
        ServerName irstats.citebase.org
        DocumentRoot /usr/local/irstats/htdocs
        <Directory /usr/local/irstats/htdocs>
                AllowOverride None
                Options None
                Order allow,deny
                Allow from all
        </Directory>
        PerlRequire /usr/local/irstats/cgi-bin/irstats.cgi
        <Location /irstats-cadair>
                SetHandler perl-script
                PerlSetVar IRStats_Config_File /usr/local/irstats/etc/irstats_cadair.conf
                PerlHandler IRStats::GUI
        </Location>
</VirtualHost>

Running IRStats

IRStats uses a single executable - irstats.cgi - for all processes. To get help on using irstats.cgi from the command line look at the man page:

./var/www/cgi-bin/irstats.cgi --man

Importing Log File Data

IRStats supports importing log data from EPrints accesslog tables, EPrints apache logs files and DSpace log files.

By default the import scripts only output on error. To get more feedback use the --verbose option (repeat for more verbosity).

Depending on the type of repository you will need to configure IRStats differently, as explained in the next sections.

EPrints Accesslog Tables

Make sure the database_eprints_access_log_table option is set correctly in the configuration file. For EPrints 2 set the repository_type option to 'eprints2' or for EPrints 3 set the repository_type option to 'eprints3'.

Execute the irstats script as follows:

./var/www/cgi-bin/irstats.cgi update_table

EPrints Apache Log Files

Set the repository_type option to 'apacheeprints'.

Execute the irstats script as follows:

./var/www/cgi-bin/irstats.cgi update_table < /var/log/httpd/access_log

If you have multiple log files you can run update_table for each log file. You must run update_table only once for each log file, otherwise you will get repeated data.

DSpace Apache Log Files

Set the repository_type option to 'dspace'.

Execute the irstats script as follows:

./var/www/cgi-bin/irstats.cgi update_table < /var/log/httpd/access_log

If you have multiple log files you can run update_table for each log file. You must run update_table only once for each log file, otherwise you will get repeated data.

Looking up IP Addresses

To get the TopTenAcademies? view working you must first run the IP lookup script:

./var/www/cgi-bin/irstats.cgi convert_ip_to_host

Depending on the number of lookups to do and your DNS server performance this may take some time to run.

Exporting Metadata

By default, data files are written to /data. You probably want to modify this in the irstats.conf file to point to somewhere more convenient.

Quick Start

IRStats includes two tools for extracting basic metadata from EPrints and DSpace: extract_generic_eprints and extract_generic_dspace. Set the repository_url in the configuration file. For DSpace set the dspace_handle. For EPrints execute:

./var/www/cgi-bin/irstats.cgi extract_generic_eprints

Or DSpace:

./var/www/cgi-bin/irstats.cgi extract_generic_dspace

If successful, this will generate the CSV data files ready for importing. These tools grab the title from the abstract page for every eprint that has been downloaded. They do not provide any set data.

EPrints 2/3

Set the repository name with repository and the fields you want to use as sets in set_ids. Then execute:

./var/www/cgi-bin/irstats.cgi extract_metadata_from_archive

Importing Metadata

If you have a complete set of CSV data files you can import that data into IRStats by executing:

./var/www/cgi-bin/irstats.cgi import_metadata

You can repeat this step without having to re-import usage data.

Updating the IRStats Database

(Work in progress)

If you are using Apache log files, when you rotate the log files run the following:

./var/www/cgi-bin/irstats.cgi update_table < /part/to/rotated/log
./var/www/cgi-bin/irstats.cgi convert_ip_to_host

If you are reading from an EPrints access log table you can periodically run (note this is very efficient and won't do anything if there haven't been any new log entries):

./var/www/cgi-bin/irstats.cgi update_table

You will also want to periodically update the metadata. IRStats doesn't currently support incremental updates to metadata, so this will be highly dependent on the size of your database.