Linking and Persistence
I’m currently working on a deliverable which relates to the provision of ‘persistent links’ to resources. This is part of that report and I’d be interested in feedback. As well as the text I’ve inlcuded a specific question at the end – I’d be very interested in responses:
When providing links to online resources it is clearly desirable that the links will work over long periods of time. However, it is common for resources to be identified and located by multiple URLs over time. This creates a challenge when forming a reference to an online resource.
This report will not attempt to cover all aspects of persistent identifiers, which are well covered elsewhere, particularly by Emma Tonkin’s 2008 article on the topic in Ariadne . However, it will consider approaches discussed within the TELSTAR project.
Digital Object Identifiers (DOIs)
A DOI name “provides a means of persistently identifying a piece of intellectual property on a digital network and associating it with related current data in a structured extensible way.” (from http://www.doi.org/faq.html#1)
On the web, a given DOI can be ‘resolved’ via a DOI System proxy server – the most commonly used being http://dx.doi.org. A DOI can be resolved by appending the DOI to the proxy server URL. For example:
DOI Name: doi:10.10.123/456
URL for resolution: http://dx.doi.org/10.10.123/456
In the majority of cases such a URL will resolve to the full text of the resource on a publishers website. However, there are examples of a DOI resolving to other services – such as a page listing a number of different URLs for the identified resource when it is available through multiple routes.
DOIs are being widely adopted to identify journal articles with a smaller amount of use to identify books, book chapters and other types of resource (see http://www.crossref.org/06members/53status.html for a breakdown of the different resources being identified by DOIs). The DOI has become part of some commonly used Citation styles such as APA .
Linking to online versions of articles using the DOI has a major drawback. Because the standard behaviour of DOI resolution services is to link to the ‘publisher’ version of the paper, it does not take into account the ‘appropriate copy’ problem . In brief the ‘appropriate copy’ problem is the issue that there may exist a number of different routes to a resource, but typically members of an institution will only be able to use a subset of the overall routes, depending on institutional subscriptions and services. It was the ‘appropriate copy’ problem that led to the development of the OpenURL standard.
PURLs (Persistent URLs)
A PURL is “an address on the World Wide Web that points to other Web resources. If a Web resource changes location (and hence URL), a PURL pointing to it can be updated.” (from http://purl.oclc.org/docs/faq.html#toc1.5)
PURLs were created in recognition that web resources can change location (and so URL) . A PURL can be assigned to a web resource and if the web resource changes location the PURL can be updated to point to the new location (URL) for the resource.
PURLs can be created through the use of appropriate software, either by hosting the software or by using a public PURL server such as that hosted by OCLC.
OpenURLs
Unlike DOIs and PURLs, OpenURLs are not specifically persistent identifiers for a resource. The OpenURL framework standard (ANSI/NISO Z39.88) enables the creation of applications that transfer packages of information over a network. The only significant implementation of the standard is to transfer metadata related to bibliographic resources.
OpenURL has seen widespread adoption by University Libraries in combination with ‘OpenURL resolver’ software. This ‘resolver’ software typically uses the metadata available from an OpenURL (transported over http) and provides a link to the ‘appropriate copy’ based on the library’s subscription information.
OpenURLs are also commonly used by ejournal platforms to enable inbound links to specific resources (typically journal articles).
As the metadata related to a publication tends to be persistent over time OpenURLs can be seen as ‘persistent’ in one sense. However, OpenURLs in themselves simply provide a transport mechanism for metadata, and how they are ‘resolved’ and what they resolve to depends on the resolver software and the information available to that resolver. This means the result of resolving an OpenURL can change over time.
Managed URLs
It is possible to enable ‘persistence’ of links to online resources by introducing and managing a level of redirection. Using a ‘managed’ URL which in turn redirects to the location of the resource it is possible to then use the managed URL in place of the current location of the resource. If the resource is moved the managed URL can be updated to point at the new location of the resource.
The Open University currently uses a number of different types of Managed URLs depending on the type of resource being linked to. These mechanisms are described below in the section on the “Current Linking strategy at the Open University”.
[the following paragraphs are not part of the report, but conclude with some questions which I’m looking for answers to, so comments would be welcome]
An example of a ‘managed URL’ at the Open University is the use of a system called ROUTES. ROUTES is an implementation of the Index+ software from System Simulation.
This is used to give a ‘managed URL’ to freely available web resources. When a resource is added to ROUTES, its URL is recorded in the record. For example see the ROUTES record for the BBC Homepage.
Once a resource has been added to ROUTES, a ROUTES URL is used in place of the resource primary URL in Open University course material. This ROUTES URL results in a http status 302 being returned (i.e. a redirect) to the resources primary URL as recorded in ROUTES. Then, if the resource moves in the future the ROUTES record can be updated, but the ROUTES URL being used in OU course material does not change. For example:
- Resource: BBC Homepage
- Primary URL: http://www.bbc.co.uk
- ROUTES Record: http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes&_IXSPFX_=f&submit-button=summary&$+with+res_id+is+res9377
- ROUTES URL: http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes&_IXSPFX_=g&submit-button=summary&$+with+res_id+is+res9377
So, my questions are
- Can we talk about ROUTES URLs as PURLs, or are there important differences between what the PURL software is doing and what ROUTES does?
- If so, what are these differences?
- Does the more generic term ‘managed URL’ fit the bill?
Jan 26th 2010 • 17:01
by Richard
ROUTES uses the techniques employed by a PURL by using an intermediate system to manage the link (ala SFX).
The ‘persistence’ of the service is a bit more difficult and you’d have to question the persistence of individual rather than collective solutions to the problem.