Data Sets

A recurring theme in the development of digital and computational tools for the study of Galileo’s library has been the messiness of the underlying data. By messiness, I mean the very human ways that we are inconsistent about how we describe books and the very mechanical ways in which letters and books are made or in which digitized text is created.

The biggest challenge for research about Galileo’s library has been determining which books and manuscripts Galileo owned. These are often questions of metadata, the data about the data: which authors, titles, editions, and formats? My book explores this in more detail, but the following pages offer more details:

After determining which books or manuscripts were in Galileo’s library or in his family’s collections, the next step is to turn their content into clean, full text that can be searched and analyzed with computational tools. This is where the bulk of current work on the library is happening: