Data Sets | Galileo's Library

A recurring theme in the development of digital and computational tools for the study of Galileo’s library has been the messiness of the underlying data. By messiness, I mean the very human ways that we are inconsistent about how we describe books and the very mechanical ways in which letters and books are made or in which digitized text is created.

The biggest challenge for research about Galileo’s library has been determining which books and manuscripts Galileo owned. These are often questions of metadata, the data about the data: which authors, titles, editions, and formats? My book explores this in more detail, but the following pages offer more details:

Book and Manuscript Identification: how do we know about a portion of the library contents?
Experiments – The Virtual Library: can we lean into the uncertainties and make them a design feature instead of a bug?
Experiments – The Interactive Shelves: can more metadata reveal patterns in the collection?

After determining which books or manuscripts were in Galileo’s library or in his family’s collections, the next step is to turn their content into clean, full text that can be searched and analyzed with computational tools. This is where the bulk of current work on the library is happening:

Full Text Corpora – what are the challenges of digitizing early modern books?
Existing Digital Texts – what texts are already available?
Creating Digital Texts – what is the process for adding more documents for analysis?