EPrints Logging Facility
This page will track suggestions/requested features related to a logging and analysis utility that will be added to EPrints.
mod_perl handlers are in EPrints::::Apache::LogHandler
See Tazmania's implementation: http://eprints.comp.utas.edu.au:81/archive/00000221/
Purpose
The uses to which such data will be put. Researchers will use the data to trace interested parties, set up collaborations and monitor for plagiarism; repository managers will use the data for advocacy work and to monitor how effectively their own repository is operating in comparison to others
Desirable Features
- Origin of accesses (by country, domain, institution, etc)
- Timing of accesses (date of access, time-series access data, cumulative access data)
- Ability to filter out home-institution accesses
- The provision of usage data for both metadata and full-text of articles
- The provision of usage data for both repository and publisher copies of articles
- (Search Terms ???)
Analysis
- Usage of individual articles over time
- Usage of articles from a department or research group over time
- Usage of articles by subject area (for example, UCL assigns subject classifiers to all articles in its repository, relating articles to departments or research groups)
- Usage of articles in comparison to the usage of other articles in the same repository (cumulative)
- Usage by keyword
- An alerting facility that draws attention to unusual usage
On the EPrint Abstract Page
- Total downloads of this eprint: last-week, last-month, total (single graph?)
- Total downloads by country of origin
- Total downloads by institution of origin
Repository Summary
Implementation
Data Handling
- Filter out 'local' accesses
- Filter out robots (Web crawlers + abusive)
New Report Page(s)
- IR downloads: last-week, last-month, total
- IR downloads by country of origin
- Number of users (sessions)?
Fields
'cache' totals in database, but store a binary database on disk?
Binary file needs to contain ... eprint id, country id, institution id, session id, source ip, access type
Types = abstract/fulltext #?
