YDC2-1: Layout analysis research

Yale University, through the Yale Digital Collections Center (YDC2), and together with its international partners, is working on the development of a worldwide community of interoperable repositories, which provide consistent access to digital representations of the manuscripts in a way that lets scholars easily and consistently view and compare manuscripts from any participating institution. The manuscript images are openly on the web and available in its Content Delivery Service through an IIIF compliant scalable image
server. In this scenario, CRS4 performs research and development of computing tools and techniques to analyse and index such kind of cultural heritage databases. The main goal will be the investigation of methods to perform document layout analysis in the case of a huge heterogeneous corpus of illuminated medieval manuscripts, with different writing styles, languages, and with various problematic attributes, such as holes, spots, ink bleed-through, ornamentation, background noise, and overlapping text lines. Particularly, the aim will be to devise a robust per-book text-line segmentation framework, a technique to order pages within a book on a text density basis, and an interactive framework to search words across a single manuscript.