EPrints Logging Facility

This page will track suggestions/requested features related to a logging and analysis utility that will be added to EPrints.

mod_perl handlers are in EPrints::::Apache::LogHandler

See Tazmania's implementation: http://eprints.comp.utas.edu.au:81/archive/00000221/

Purpose

The uses to which such data will be put. Researchers will use the data to trace interested parties, set up collaborations and monitor for plagiarism; repository managers will use the data for advocacy work and to monitor how effectively their own repository is operating in comparison to others

Desirable Features

  • Origin of accesses (by country, domain, institution, etc)
  • Timing of accesses (date of access, time-series access data, cumulative access data)
  • Ability to filter out home-institution accesses
  • The provision of usage data for both metadata and full-text of articles
  • The provision of usage data for both repository and publisher copies of articles
  • (Search Terms ???)

Analysis

  • Usage of individual articles over time
  • Usage of articles from a department or research group over time
  • Usage of articles by subject area (for example, UCL assigns subject classifiers to all articles in its repository, relating articles to departments or research groups)
  • Usage of articles in comparison to the usage of other articles in the same repository (cumulative)
  • Usage by keyword
  • An alerting facility that draws attention to unusual usage

On the EPrint Abstract Page

  • Total downloads of this eprint: last-week, last-month, total (single graph?)
  • Total downloads by country of origin
  • Total downloads by institution of origin

Repository Summary

Implementation

Data Handling

  • Filter out 'local' accesses
  • Filter out robots (Web crawlers + abusive)

New Report Page(s)

  • IR downloads: last-week, last-month, total
  • IR downloads by country of origin
  • Number of users (sessions)?

Fields

'cache' totals in database, but store a binary database on disk?

Binary file needs to contain ... eprint id, country id, institution id, session id, source ip, access type

Types = abstract/fulltext #?

Unless explicitly stated otherwise all content © University of Southampton 2007.