It is of great interest to researchers and scholars in many disciplines (particularly those working on culture heritage projects) to study parallel passages (i.e., identical or similar pieces of text describing the same thing) in digital text archives. Although there exist a few software tools for this purpose, they are restricted to a specific domain (e.g., the Bible) or a specific language (e.g., Hebrew). In this paper, we present in detail how we build a digital infrastructure that can facilitate the search and discovery of parallel passages for any domain in any language. It is at the core of our Samtla (Search And Mining Tools with Linguistic Analysis) system designed in collaboration with historians and linguists. The system has already been used to support research on five large text corpora that span a number of different domains and languages. The key to such a domain-independent and language-independent digital infrastructure is a novel combination of a character-based $n$-gram language model, space-optimised suffix tree, generalised edit distance. A comprehensive evaluation through crowd-sourcing shows that the effectiveness of our system's search functionality is on par with the human-level performance.
The attraction to optical playback of audio discs records was described by Brock-Nannestad in 2001. Several different approaches have been demonstrated to work. But in most cases the playback quality is worse than using mechanical playback. The Saphir approach used in the presented work uses a specifically-designed colour illuminator that exploits the reflective properties of the disc material to highlight subtle changes in orientation of the groove walls, even at highest frequencies (20kHz). A standard colour camera is used to collect rings of pictures from the disc. Audio signal is extracted from the collected pictures automatically, under control of the user. The process is slow - several hours per disc - but has a wide range of operation on recorded and printed discs, from earliest Berliner recordings to recent vinyls, and its strength is at decoding direct-recording lacquer discs. An Elementary Shortest Path Solver with a reward (negative cost) on the number of turns is used to re-connect all the sub-tracks obtained, allowing to reconstruct, with limited human intervention, the correct playback order. We describe the approach, present the main advantages and drawbacks of the method, and show that it can be used to play back even extremely damaged (broken, de-laminated...) records.
Munsell Soil Charts are a very common tool used by archaeologists for the color specification task. Charts are usually employed directly on cultural heritage sites to identify color of soils and collected artifacts. However, charts are designed to be used specifying the color through subjective perception of users, by visual mean, in a time consuming and error-prone procedure. It is likely that two users may estimate different Munsell notations for the same specimen, as colors are not perceived uniformly by different people. Hence, estimation process should be repeated several times and by more than a single expert user in order to be considered reliable. In this work, we employ our framework ARCA: Automatic Recognition of Color for Archaeology, specifically designed to provide a method for objective, deterministic, fast, and automatic Munsell estimation. ARCA is a valuable asset for archaeologists as it provides the definition of a smooth pipeline for an affordable Munsell notation estimation: image acquisition of specimens with general purpose digital cameras in an uncontrolled environment, manual sampling of specimen images in the ARCA desktop application, automatic Munsell color specification, and report generation. We further assess our method with improved color tolerance validations and evaluations, introducing a comparison between E00, E76, L*, a*, and b* differences. One of the main contribution of this paper is the extension of our former dataset ARCA108. We gathered two additional sets of images obtaining a new dataset consisting of pictures of Munsell Soil Charts Editions 2000 and 2009 plus images from a real test-case with 16 pottery shards. The new dataset counts 56,160 samples and 328 images, so it has been called ARCA328. Experimental results are reported to investigate which could be the best configuration to be used in the acquisition phase.
At historic open-air museums, many of the objects under investigation are buildings and landscapes that could tell multiple, overlapping narratives: i.e., they were built/manipulated over the course of years by different peoples and groups who used them for varying purposes. In this paper, we address this challenge by proposing the use of interactive maps to orient visitors in time, space, and both time and space. We conducted a series of collaborative-design workshops to elicit recommendations. From the analysis of the transcripts, we identified four design elements and two functionalities that could be used for these purposes. We then conducted a study at an open-air museum to compare the extent to which these design elements and functionalities (and a prototype that integrates them) allow visitors to orient themselves in time and space, and to notice change over time.