It is of great interest to researchers and scholars in many disciplines (particularly those working on culture heritage projects) to study parallel passages (i.e., identical or similar pieces of text describing the same thing) in digital text archives. Although there exist a few software tools for this purpose, they are restricted to a specific domain (e.g., the Bible) or a specific language (e.g., Hebrew). In this paper, we present in detail how we build a digital infrastructure that can facilitate the search and discovery of parallel passages for any domain in any language. It is at the core of our Samtla (Search And Mining Tools with Linguistic Analysis) system designed in collaboration with historians and linguists. The system has already been used to support research on five large text corpora that span a number of different domains and languages. The key to such a domain-independent and language-independent digital infrastructure is a novel combination of a character-based $n$-gram language model, space-optimised suffix tree, generalised edit distance. A comprehensive evaluation through crowd-sourcing shows that the effectiveness of our system's search functionality is on par with the human-level performance.
The attraction to optical playback of audio discs records was described by Brock-Nannestad in 2001. Several different approaches have been demonstrated to work. But in most cases the playback quality is worse than using mechanical playback. The Saphir approach used in the presented work uses a specifically-designed colour illuminator that exploits the reflective properties of the disc material to highlight subtle changes in orientation of the groove walls, even at highest frequencies (20kHz). A standard colour camera is used to collect rings of pictures from the disc. Audio signal is extracted from the collected pictures automatically, under control of the user. The process is slow - several hours per disc - but has a wide range of operation on recorded and printed discs, from earliest Berliner recordings to recent vinyls, and its strength is at decoding direct-recording lacquer discs. An Elementary Shortest Path Solver with a reward (negative cost) on the number of turns is used to re-connect all the sub-tracks obtained, allowing to reconstruct, with limited human intervention, the correct playback order. We describe the approach, present the main advantages and drawbacks of the method, and show that it can be used to play back even extremely damaged (broken, de-laminated...) records.
At historic open-air museums, many of the objects under investigation are buildings and landscapes that could tell multiple, overlapping narratives: i.e., they were built/manipulated over the course of years by different peoples and groups who used them for varying purposes. In this paper, we address this challenge by proposing the use of interactive maps to orient visitors in time, space, and both time and space. We conducted a series of collaborative-design workshops to elicit recommendations. From the analysis of the transcripts, we identified four design elements and two functionalities that could be used for these purposes. We then conducted a study at an open-air museum to compare the extent to which these design elements and functionalities (and a prototype that integrates them) allow visitors to orient themselves in time and space, and to notice change over time.