Activity data cookbook
The RISE team had a really good session yesterday working with Mark van Harmelen from Hedtek to go through and develop a series of ‘recipes’ for the Activity Data ‘cookbook’. These recipes describe the processes involved in the software that we have created to handle activity data. Over the course of three hours we managed to get five processes down on paper, covering the three types of recommendations we are making in RISE, the processes for parsing the EZProxy log files and the process we use to get course information into MyRecommendations.
We actually found it to be quite a good way of describing and documenting what the project software is doing. It seemed to be a bit easier to do than we’d expected and it was certainly a useful discipline to have to explain to someone from outside the project how things worked.
It also provided a useful challenge to us as it uncovered at least one issue that we need to do some more thinking about. This relates to how we handle courses in MyRecommendations and specifically what we do when users change courses. We currently take a feed from our systems that tell us which courses a student is studying. [and for the OU, where students study a module at a time, that can be several modules]. Talking through things yesterday made it clearer that we need to be able to store the historical course data, so that the system keeps the link between the course the user is currently studying and the resources viewed. So we need to come up with a solution to this that doesn’t overwrite the courses a student was taking last year with the courses they are taking this year.
It still leaves us with having to think carefully about how we handle searches for users who are studying multiple courses at a time but we already knew that. That may well be something that is unique to the OU anyway. Our current thinking is that the number of students studying courses that are widely different is likely to be low, it is more likely that they would be studying related courses. In any case a student might well search for things entirely unrelated to their course. So our default would be to associate the searches with all the student’s courses and rely on the relevance ranking process to make sure that unrelated articles don’t appear high in the recommendations list. If testing finds this to be a problem then we could look at the approach Dave Pattern is suggesting in LIDAP and use a threshold discounting one-off relationships.
Data release formats
It was also a good opportunity to talk to Mark about XML data formats for the data we hope to release openly. Mark wrote the original MOSAIC project data collection guide which outlined an XML format for user activity data. For RISE we’ve revisited this format and tweaked it a bit to handle the e-resource data that we’re concerned with. There are a few things we would need to change about the course data and the resource information descriptions. Mark offered the really valuable insight that we only really needed to be able to provide user, resource and search term data. We didn’t need to make explicit recommendations within the data we released as people could use the data to build their own. That’s been really helpful and we are revising the draft format and plan to post it on here in the near future and talk to people about it.