EZProxy pros and cons
What RISE has demonstrated to us is that using proxy server logfiles from EZProxy as the source of your recommendations has some major limitations (in comparison with OpenURL data at least). In part this is due to limitations in the data that is being handled, but particularly in the way that we, at the OU, are using EZProxy.
The first limitation relates to how we use EZProxy and particularly how we use it now we have implemented the Ebsco Discovery Solution. At the Open University most of our students use our services off-campus, so we push every electronic resource we can through EZProxy. So when we came to define the project EZProxy seemed like a good place to draw our recommendations from as it saw the greatest coverage of our overall traffic.
Now, at the time when we defined the project we were using a federated search system and just swopping to a discovery system. With federated search each of the search targets appeared individually within the EZProxy logfiles with their own URLs so an analysis of the logfiles would show which search target was supplying your content. But, when we switched over to the discovery solution we decided that we would put that through EZProxy. So most of our searches now go to EBSCO and that pulls the full text of the article from the content supplier. Consequently as far as our EZProxy logfile is concerned all it sees is a search to EBSCO not to the final content provider.
As far as recommendations are concerned that isn’t a major issue but it does mean that analysing the logfiles to find out useful usage data may not work for us (so we need to test it to be sure).
EZProxy and article level metadata
On the plus side having the Ebsco Discovery Solution API has meant that we are at least able to do something that addresses a major limitation of the EZProxy logfile data. Generally there is very little blibliographic metadata within the logfile (certainly in comparison with OpenURL logfiles). To be able to display sensible recommendations you do need to be able to show some descriptive element to help users understand what is being recommended. As a minimum you would want to show an article title and ideally you would want to show a journal title, date and maybe authors and a DOI.
Your EZProxy logfile data already has a URL you can use to link to the content but some form of bibliographic description is essential as otherwise users cannot choose which recommendations are relevant.
Now to be able to display an article title for your recommendations if you don’t have that data in your original logfile requires you to do some post-processing. In the case of RISE, because the majority of our logfile data relates to EBSCO then we can use the Ebsco Discovery Solution API to retrieve some basic metadata about the article, such as the DOI or article title.
But this starts to raise some complications, especially if your end-game is to be able to openly release your search data (more later). Under our license terms we aren’t permitted to store that data within the RISE database. Now theoretically we already have an internal record ID so we could technically pull the article title in real-time using the API and display it within the RISE interface. However with API response times typically being 3-4 seconds it isn’t practicable to send up to a dozen API calls just to populate a single page of recommendations and results.
So we’ve ended up at the moment with using the EDS metadata as a key to retrieve data from Crossref that we are licensed to store locally. Fortunately we have found quite a high overlap between the data sources so have been able to get data for most of our recommendations. So article level metadata, where you can get it from and what you can do with it, seems to be a major issue.
Open article level metadata
There does however seem to be some differences of opinion between providers of article level metadata (although in the case of aggregators it may be that they themselves are actually licensing it rather than creating it) and Rights and Legal experts over exactly what you can and cannot do with article level metadata. Whether as essentially a statement of fact it is possible to restrict what can be done with this data and whether extracting selected data into another database is allowable or not.
Certainly for RISE it brings in added complications. We’ve pretty much run out of time to do too much more. We can think of a couple of alternative approaches using OpenURL data from EDINA or data from Mendeley that might allow us to match data to the RISE recommendations in a way that would allow the full dataset to be openly released. But realistically that may not be able to be achieved by the time the project ends this month. At the moment we are left with potentially being able to release the EZProxy data without bibliographic data and that may be of limited value. But we will get as far as we can.