The Memento Tracer framework introduces a new collaborative approach to capture web publications for archival purposes. It is inspired by existing capture approaches yet aims for a new balance between the scale at which capturing can be conducted and the quality of the snapshots that result.
In order to archive the essence of a web publication, a range of web resources need to be captured. But, many times, capturing those resources is not trivial.
Memento Tracer is inspired by all these approaches but aims to strike a new balance between the scale of capturing and the quality of resulting captures.
The Memento Tracer frameworks consists of:
class ID
or XPath
. Since all pages of the same class are based on the same template, the resulting Traces apply across all pages of the class rather than to this specific page only. Currently, in addition to recording simple mouse-clicks, the extension is able to record - with a single interaction by the curator - the notion of repeated clicks (e.g., navigate through all slides of the presentation) and clicks on all links in a certain user interface component. For example, below is a Trace that results from the curator indicating that the "next slide" button should be clicked repeatedly. Note that the Trace also indicates the URL pattern to which the Trace applies, and provenance information including the resource on which the Trace was created and the user agent used to create it. When the lay-out and/or affordances for a particular class of web publications changes, a new Trace has to be recorded to ensure that captures maintain their high quality.
{
"portal_url_match": "(slideshare.net)\/([^\/]+)\/([^\/]+)",
"actions": [{
"action_order": "1",
"value": "div.j-next-btn.arrow-right",
"type": "CSSSelector",
"action": "repeated_click",
"repeat_until": {
"condition": "changes",
"type": "resource_url"
}
},
{
"action_order": "2",
"value": "div.notranslate.transcript.add-padding-right.j-transcript a",
"type": "CSSSelector",
"action": "click"
}
],
"resource_url": "https://www.slideshare.net/hvdsomp/creating-pockets-of-persistence",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/68.0.3417.0 Safari/537.36"
}
Once a Trace is successfully recorded, the curator uploads it into a shared community repository. This can, for example, be done by means of a pull request to a GitHub repository, which is subsequently evaluated by the maintainers of the repository. The organization of the repository allows to quickly locate Traces for specific classes of pages and by specific curators. Since the perspective of what the essence of a web publication is may differ from one curator to the next, the repository supports multiple Traces for a specific class of pages. Each can be unambiguously identified in the repository. Also, since the layout of pages evolves over time, Traces will need updating. This makes version support by the repository essential.
It is hard to say when Memento Tracer will be ready for a test ride, let alone for prime time. The components are currently experimental but we are making promising progress. The process of recording Traces and capturing web publications on the basis of these Traces has been demonstrated successfully for publications in a range of portals. But there also remain challenges that we are investigating, including: