Manual
The default options to configure will install IRStats into /usr/local. For testing purposes it's recommended that you work in /usr/local. These instructions refer to the locations as if you had run configure as:
./configure --prefix=/
If you didn't install IRStats into the root of your system you will need to prefix /usr/local/irstats to any paths referred to here.
Contents
Configuration
To configure IRStats you should only need to modify two files - the paths in the main executable /var/www/cgi-bin/irstats.cgi and the configuration file /etc/irstats.conf.
irstats.cgi
You may need to modify irstats.cgi depending on where you installed ChartDirector/awstats. If any of these paths are incorrect you will get a Perl error on running irstats.cgi saying that a module could not be found in the Perl path.
#!/usr/bin/perl -w ############################################################################## ### Configuration ### ############################################################################## # IRStat's perl modules use lib "/usr/local/irstats/var/www/irstats/lib"; # awstats modules use lib "/var/www/awstats/lib"; # ChartDirector use lib "/usr/local/irstats/usr/lib/ChartDirector"; # EPrints (if using EPrints) use lib "/opt/eprints2/perl_lib"; use IRStats; # The path to IRStat's configuration file $IRStats::Configuration::FILE = "/usr/local/irstats/etc/irstats.conf";
irstats.conf
IRStats' behaviour is controlled by its configuration file. Options are specified by an option name/value pair:
#Set configuration for irstats with this file # the repository type (used by update_table) # one of eprints2, eprints3, dspace, apacheeprints repository_type = eprints3 # repository_type = eprints2 # the eprints repository to extract from repository = soton # sets to allow filtering by set_ids = internal_group, subjects, creators
Multiple values can be specified by providing a comma-separated list:
name = value1, value2, value3
Values are unquoted (don't put quote marks around values).
Available options
- all_dashboard
- Views to show on the dashboard for all eprints. You need to define a list of Views for every set_id by specifying the set_id followed by _dashboard.
- cache_path
- The directory used to store temporary cache files (must be writeable by the Web server).
- database_column_table_prefix
- The prefix to use for column tables.
- database_driver
- The database driver to use - must be mysql for MySQL or Pg for Postgres.
- database_eprints_access_log_table
- The name of the EPrints access log table to read from.
- database_id_columns
- The database columns to map to auxiliary tables (more efficient storage) - probably want to leave as requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id.
- database_main_stats_table
- The main table to use for IRStats.
- database_name
- The database name to use.
- database_password
- Password to connect to the database with.
- database_port
- Port to connect to (optional).
- database_server
- The database host name (localhost for local).
- database_set_table_citation_suffix
- Suffix to append to set citation tables.
- database_set_table_code_suffix
- Suffix to append to set code tables.
- database_set_table_prefix
- The prefix to use for set tables.
- database_user
- User name to connect to the database with.
- database_table_prefix
- The prefix to use for IRStats tables.
- dns_cache_file
- The file to store the DNS lookup cache in.
- dspace_handle
- The major part of the DSpace handle (handle/XXXX/1223).
- geo_ip_country_file
- The Geo IP country database file.
- geo_ip_org_file
- The Geo IP organisation database file.
- id_parameters
- The parameters that are used to uniquely identify a view.
- max_cache_age
- Maximum time to keep a cache file for in seconds. referrer_scope_{1,2,3,4,no_referrer}:: Labels for the different referrer scopes.
- repeats_filter_file
- The file to store the repeated requests in.
- repeats_filter_timeout
- The minimum time that must elapse between requests for the same eprint from the same IP address in seconds.
- repository
- The repository name.
- repository_type
- The repository software (controls where IRStats reads usage data from). One of 'eprints2', 'eprints3', 'dspace' or 'apacheeprints'.
- repository_url
- The base URL of the repository.
- root
- The root of the IRStats directory - this value gets prefixed to any path value that is relative.
- set_ids
- A list of fields that eprints can be grouped by. 'eprint' is implicitly a member of set_ids and does not need to be specified. For GNU EPrints simply specify the field name to use.
To use one field for the set names and another for the set value (likely in Compound fields) append "_id_field" to the field name and give a value of the field to actually use for the set e.g. if you have a compound field of creators_name and a identity field of creators_id provide an option of creators_name_id_field = creators_id.
- set_member_full_citations_file
- The CSV file containing full citations for members of set_ids.
- set_member_short_citations_file
- The CSV file containing short citations for members of set_ids.
- set_member_codes_file
- The CSV file containing code mappings - maps from a user specified string into the internally used identifier.
- set_member_urls_file
- The CSV file containing URLs for members of set_ids.
- set_membership_file
- The CSV file containing set memberships.
- set_phrases_file
- The CSV file containing phrases for various IRStats things.
- static_path
- The directory used to store IRStats statically served fields (the style sheets and thumbnails). The /graph directory below this must be writeable by the Web server.
- static_url
- The relative URL of the static_path directory. Leave blank if IRStats is not installed in a sub-directory.
- update_lock_filename
- Filename to use for preventing multiple instances of the IRStats process running.
- view_path
- The directory containing the IRStats View modules.
Configuring Apache to use IRStats
IRStats requires it's cgi script (irstats.cgi) and it's static files be made available from the Web server.
On most systems the CGI script can be made available by copying it to /var/www/cgi-bin, in which case it can be access from http://yourhost/cgi-bin/irstats.cgi. By default IRStats expects to find its static files at http://yourhost/irstats/, which can be achieved by copying the static files to /var/www/html.
If you want to locate IRStats somewhere else please consult your Web servers documentation.
# Example using mod_perl and the PerlSetVar to serve multiple
# repositories from a single CGI script
<VirtualHost *:80>
ServerName irstats.citebase.org
DocumentRoot /usr/local/irstats/htdocs
<Directory /usr/local/irstats/htdocs>
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
PerlRequire /usr/local/irstats/cgi-bin/irstats.cgi
<Location /irstats-cadair>
SetHandler perl-script
PerlSetVar IRStats_Config_File /usr/local/irstats/etc/irstats_cadair.conf
PerlHandler IRStats::GUI
</Location>
</VirtualHost>
Running IRStats
IRStats uses a single executable - irstats.cgi - for all processes. To get help on using irstats.cgi from the command line look at the man page:
./var/www/cgi-bin/irstats.cgi --man
Importing Log File Data
IRStats supports importing log data from EPrints accesslog tables, EPrints apache logs files and DSpace log files.
By default the import scripts only output on error. To get more feedback use the --verbose option (repeat for more verbosity).
Depending on the type of repository you will need to configure IRStats differently, as explained in the next sections.
EPrints Accesslog Tables
Make sure the database_eprints_access_log_table option is set correctly in the configuration file. For EPrints 2 set the repository_type option to 'eprints2' or for EPrints 3 set the repository_type option to 'eprints3'.
Execute the irstats script as follows:
./var/www/cgi-bin/irstats.cgi update_table
EPrints Apache Log Files
Set the repository_type option to 'apacheeprints'.
Execute the irstats script as follows:
./var/www/cgi-bin/irstats.cgi update_table < /var/log/httpd/access_log
If you have multiple log files you can run update_table for each log file. You must run update_table only once for each log file, otherwise you will get repeated data.
DSpace Apache Log Files
Set the repository_type option to 'dspace'.
Execute the irstats script as follows:
./var/www/cgi-bin/irstats.cgi update_table < /var/log/httpd/access_log
If you have multiple log files you can run update_table for each log file. You must run update_table only once for each log file, otherwise you will get repeated data.
Looking up IP Addresses
To get the TopTenAcademies? view working you must first run the IP lookup script:
./var/www/cgi-bin/irstats.cgi convert_ip_to_host
Depending on the number of lookups to do and your DNS server performance this may take some time to run.
Exporting Metadata
By default, data files are written to /data. You probably want to modify this in the irstats.conf file to point to somewhere more convenient.
Quick Start
IRStats includes two tools for extracting basic metadata from EPrints and DSpace: extract_generic_eprints and extract_generic_dspace. Set the repository_url in the configuration file. For DSpace set the dspace_handle. For EPrints execute:
./var/www/cgi-bin/irstats.cgi extract_generic_eprints
Or DSpace:
./var/www/cgi-bin/irstats.cgi extract_generic_dspace
If successful, this will generate the CSV data files ready for importing. These tools grab the title from the abstract page for every eprint that has been downloaded. They do not provide any set data.
EPrints 2/3
Set the repository name with repository and the fields you want to use as sets in set_ids. Then execute:
./var/www/cgi-bin/irstats.cgi extract_metadata_from_archive
Importing Metadata
If you have a complete set of CSV data files you can import that data into IRStats by executing:
./var/www/cgi-bin/irstats.cgi import_metadata
You can repeat this step without having to re-import usage data.
Updating the IRStats Database
(Work in progress)
If you are using Apache log files, when you rotate the log files run the following:
./var/www/cgi-bin/irstats.cgi update_table < /part/to/rotated/log ./var/www/cgi-bin/irstats.cgi convert_ip_to_host
If you are reading from an EPrints access log table you can periodically run (note this is very efficient and won't do anything if there haven't been any new log entries):
./var/www/cgi-bin/irstats.cgi update_table
You will also want to periodically update the metadata. IRStats doesn't currently support incremental updates to metadata, so this will be highly dependent on the size of your database.
