Author Archives: Professor Crystal Hall

Research and Internship Opportunities

Conferences and summer internships are two instructive ways to gain more experience in the field of digital humanities outside of the classroom. Upcoming opportunities include:

The Association for Computing Machinery Special Interest Group on the Design of Communication has a call for proposals for the Student Research Competition. Selected undergraduate and graduate students will present their individual research at the conference to judges and attendees. The topics of interest include, but are not limited to: communication design, user experience, information design, and learning systems/environments. Learn more about the conference and competition here: http://sigdoc.acm.org/conference/2016/student-research-competition/.

The Berkman Center for Internet and Society at Harvard University has positions for full-time summer interns. Interns work on various projects that explore the intersection of technology and communication in a collaborative environment. Interns can join research teams in areas such as academic innovation, law, computer science, and open access projects. The Berkman Center also hosts intern discussion hours and events with the larger Berkman community. Specific research projects available to interns can vary each summer. Learn more about the internship: http://brk.mn/summer.

The Social Computing Lab at Carnegie Mellon University has a Research Experience for Undergraduates (REU) program. This summer program offers research assistant positions in the fields of psychology, computer science, human-computer interfaces and language technologies. The 10 week long summer research exposes a diverse group of undergraduates to academic research in a modern research lab setting. There will also be seminars for students participating in REU in addition to Social Computing Lab seminars and those held by Carnegie Mellon’s Human Computer Interaction Institute and Language Technologies Institute. Program Details and Application Instructions Available here: https://hciisocialcomputing.wordpress.com/summer-reu-program-description/

Keep an eye on these sites in Winter 2016 for a Summer 2017 opportunity:  http://data.betaworks.com/ (they announced a 2016 summer internship with applications due Jan. 18, 2016)

http://librarylab.law.harvard.edu/fellows (they announced a 2016 summer internship with applications due by April 27, 2016).

Graduate Opportunities

In addition to undergraduate research and courses, there are now more graduate opportunities in fields related to digital humanities. The opportunities below represent a sample of opportunities available after undergraduate study.

Northeastern University’s School of Public Policy and Urban Affairs offers a M.S. in Urban Informatics. The program “couples comprehensive data analytics skills with an understanding of the big questions faced by cities in the 21st Century city.” There are four Urban Informatics interdisciplinary core courses that focus on data science/analytics as well as two electives in urban data skills. Find more information here: http://www.northeastern.edu/cssh/policyschool/graduate-programs/urban-informatics/.

Northeastern University also offers an MFA in Information Design and Visualization. This program “trains students in harnessing visual languages to support discovery and communicate information across a range of socially relevant issues.” The program is particularly focused on taking an interdisciplinary approach to communication. Find more information here: http://www.northeastern.edu/camd/artdesign/academic-programs/mfa-in-information-design-and-visualization/.

Arizona State University offers a Master’s in Social Technologies. The program “divides its focus between theoretical and applied work, drawing on social, behavioral, critical, cultural and design perspectives. Courses include Networked Social Technologies; Social Technology; and Community Informatics.” Find more information here: https://asunow.asu.edu/20160105-creativity-picturing-where-social-media-headed.

Digital Reconstructions of Libraries

Libraries are very much on my mind these days as I grapple with the best methodologies for reconstructing and visualizing Galileo’s library. I am also working constantly with digital collections: institutional libraries, archives of organizations, and single studies of authors. Perhaps it is no surprise then, that when first asked to suggest possible readings for the section of the Gateway to Digital Humanities Course that focuses on textual analysis, I immediately recommended Jorge Luis Borge’s “Library of Babel.”

To me this short essay represents many of the possibilities and pitfalls of digital and computational library studies. Borges imagines a library that holds one copy of every book that could possibly be written. Some contain gibberish, others perfect copies of known work. Scholars live in the library searching for answers to questions about human experience. Ideological camps form and battles ensue, but all the while, even this hyperbolically complete library remains enigmatic to its users due to its sheer size. In parallel ways, computers have the potential to create a similar digital library. Natural language processing has already shown that computers can generate prose that has the “sound” of known authors like Immanuel Kant. Programming loops (of the kind the Gateway to Digital Humanities students are applying to images) perform the same action repeatedly (changing one pixel at a time, for example) and could conceptually be employed to provide the infinite variety of texts that populate “The Library of Babel.”

For readers of Python programming language, I tried to express this impossible program in loop terms in Jython. Strings and concatenation would help, but I think this still conveys the message in a light-hearted form:

Screenshot (Crystal Hall, 2013) of JES Jython platform.

Screenshot (Crystal Hall, 2013) of JES Jython platform.

The above attempt at code (that has legal syntax for Jython, but an error-filled program) is a futile approach for bringing order to chaos. Some Digital Humanities (DH) scholars would argue that digital and computational studies could offer partial solutions to comprehending and organizing this vast quantity of textual information. This is quite optimistic that estimates suggest 340 million new 140-character tweets on Twitter daily, not to mention the 3.77 billion (and growing) indexed pages on the world wide web.

Working even with the available (and manageable) digital data, certain assumptions are made by tools and certain information is lost in their application, all of which gives me pause for thought as I reconstruct and try to find analytical pathways through the library of a person about whom ideological fields have been defined and passionate battles have been fought for centuries. Matt Jockers has led the field of DH with his work on Macroanalytics, currently focused on establishing patterns in nineteenth-century fiction, but relies on only the books for which a digital copy has been made. Google books Ngram Viewer allows users to compare the frequencies of words that appear in digital or digitized books during different time periods, which assumes consistency of cataloguing and meta-data entry across all participating institutions, which is not always the case.

Screenshot (Crystal Hall, 2013) of Google books Ngram Viewer.

Screenshot (Crystal Hall, 2013) of Google books Ngram Viewer.

As I revisit the data for my own project on Galileo, I wonder where I will enter the ideological disputes that surround the interested fields; I worry about what information will be excluded from the data; and how my users will navigate the digital library I am about to create.

 

 

Excel Data and Gephi Data Laboratory

My goal for this blog entry is to explain how to organize data within an Excel Spreadsheet (that will be saved as a Comma Separated Values file or .csv) to import into Gephi for visualization and analysis of nodes (individual elements represented as points) and edges (relationships represented by connective lines) in a network. My explanation assumes familiarity with the Gephi tutorials based on prepared .gexf files (the extension for files readable by Gephi) of Les Miserables or Facebook data. I assume that my reader is now thinking about applying network analysis to her own research.

New users of Gephi may not have any familiarity with .gexf files, XML mark-up, or other code for organizing data, but can still find use in Gephi.  Excel is typically a more user-friendly application for this kind of organization, and most databases (Microsoft Access for example) can be converted to an Excel workbook (.xls) or directly to a .csv file. The explanations assume a basic understanding of storing, copying, and sorting data in Excel. The organizational principles described below can be applied to whichever application you use to generate the tabular .csv files that you will use in Gephi. Other supported formats and their functionality can be found at Gephi’s site.

I am using screenshots from my own research data on the books in Galileo Galilei’s library to help demonstrate the kinds of information each column should contain. Below is a screen shot of one spreadsheet in the Excel workbook that I have used to organize all of my notes related to the project:

gephiblog1 There are many spreadsheets listed in the tab bar at the bottom of the screen for the different kinds of information I have for the project. Importantly, a .csv file only retains the information in the active worksheet (“By author” in this case, the tab in white) and will not save the other sheets. It is important to copy the information you want to use from your primary workbook (multiple sheets) to a single-spreadsheet workbook for nodes and a single-spreadsheet workbook for edges. Also, the column headings in my workbook (“My#”, “Fav’s#”, “Author. Favaro’s full citation”, “Year”, etc.) are my shorthand and cannot be interpreted by Gephi, another reason that copying the information you want to use to new single-spreadsheet workbook files is highly recommended.

1)   You will need to create two .csv files: a node table and an edge table. I use Excel as my tabular application, and Excel files save by default to the .xlsx format. In order to get the .csv, you need to choose that option for file format when saving.

2)   The node table tells Gephi all of the possible nodes in a network and must have at least the columns Id and Label. There should be one line for every node that will appear in either gephiblog2column of the edge table:

 

This seems easy enough, but what kinds of information are best placed in the Id column, and how should that differ from the Label? The example above is taken from a spreadsheet that I use to organize information about Galileo’s library. All of my nodes in this example are the proper nouns that are found in titles in the library and the titles themselves (about 2650 nodes total). The example above is, in a word, clunky. It is redundant and ultimately makes my network visualization unreadable if I try to add labels over the nodes. Consider the following example in which full titles would become labels over roughly 650 nodes (obscuring nodes and edges in the process):

gephiblog3

Having a unique identifying number (the Id that Gephi expects) allows me to store a lot of information about that node in a spreadsheet or database that I can later choose to access as necessary. Since my organizational system was created long before I knew about Gephi, my Label column corresponds to the Full Title column in my spreadsheet (which ultimately clutters my visualization to the point of illegibility if I add labels). To make this more readable, I need to change the data in the Label column to the data from a “Short Title” column.

3)   As you might notice, there are other columns in the first screen shot for the node table. The node table can also include attributes (in parenthesis in the example because they are not necessary for a basic visualization of a network). Attributes are a way to categorize data, perhaps by gender, race, age, etc. While not necessary for exploring data with Gephi, they allow for a more nuanced exploration of a network. For example, I will want to add attribute columns for religious affiliation (Jesuit, Benedictine, Protestant, Catholic, etc.) and genre to start visualizing the data in a way that helps me answer my research questions. Attribute columns can also be added in the “Data Laboratory” section of the Gephi interface even after you have loaded the .csv files for the nodes and edges.

4)   The time interval is another optional column of information to include about your data, which may or may not be applicable or useful. I copy here a partial screenshot from the Gephi.org page as a reference:gephiblog4

The Gephi wiki also displays the code behind this process.

 

Thinking about my own dataset, I need a Time Interval column for every title that shows the earliest year that a book could have entered the library. I will stop my time intervals with Galileo’s death in 1642. From the examples in part 3, the time interval information would look like this in the .csv version of the spreadsheet, with the columns Id, Time Start, Time End:

4,1640,1642

5,1628,1642

6,1637,1642

Once you have uploaded the .csv, in Data Laboratory, you can merge the Time Start and Time End columns using the merge strategy “Create Time Interval.” This will concatenate and format what you need in order to be able to view the change over time of the network.

5)   The edge table (the second .csv file that you need to create) then tells Gephi the connections that exist between the nodes. It must have the columns Source and Target:   gephiblog5

This is where having a unique identifier for all nodes can be very convenient. My source above is title 299 in which the Cologne Academy is mentioned as a contributor to the book that I have given the identifier 299. Book titles can include people or places (Targets), but people or places cannot include titles (Sources), so my edges are directed, and the distinction between source nodes and target nodes is critical.

6)   Similarly to the node table, there are many optional categories that can add nuance to an analysis of a network. The edge table can also include a Label column to help with categorization of relationship types, a unique Id for the relationship (generated by Gephi), Attributes (eg: family, friend, co-worker, classmate, etc. for social networks), and Time Interval.

7)   The edge table can also include information not found in the node table. Type indicates whether the relationship is directed or undirected. This column can be auto-filled on upload and is visible in the Data Laboratory.

8)   Another option for the edge table is to provide weightedness for relationship. Weight is your opportunity to give more importance to certain relationships by giving them a numerical weight.

Remember to save the files as .csv, then load them in Gephi, nodes first, using the “Import .csv” option in the Data Laboratory toolbar.  Be sure to indicate which type of file you are uploading (node table or edge table), otherwise you risk error messages.

Data can simply be input directly into the Data Laboratory of Gephi, but I am most familiar with the functionality of Excel, have organized my research data using spreadsheets, and prefer to make adjustments, filter data, and store my information in one format. Programming languages such as R seem particularly adept at creating the tabular information needed here, particularly when automatically pulling data from a large corpus.

My approach may not work for everyone or every project, but hopefully seeing real data in a raw format provides context for its presentation in the data gephiblog6laboratory:

 

 

In turn, that should make the analysis of something as complex as the visualization of the connections between names in Galileo’s library less opaque:gephiblog7

 

Emese Gaal on INTD 2401: Gateway to the Digital Humanities

Jack Gieseking and Crystal Hall recently spoke with Emi Gaal ’15 about her experience in the new “Gateway to the Digital Humanities” course.

Why are you taking the Gateway course?

As someone more involved in the humanities and social sciences, this class seemed like a nice first segue into the more technical realm of computer science while still focusing on broad objectives in both social and technical sciences. Additionally, I have always wanted to take both a computer science class and an art history class at Bowdoin, as both are very interesting to me, and the interdisciplinary nature of this class has provides a great introduction to both.

 

What has surprised you in the seminar so far?

I am most surprised by how useful it is to understand the methodology behind programming within programs such as GIS, as it allows the user to be more intentional about the commands he or she aims to carry out. Being more in the “know” about how the whole digital sphere operates is empowering and I believe it will help me better understand the possibilities and limitations of executing projects.

Do you have any early ideas about your final project?

I don’t have an idea just yet, but I think using GIS, as it is a program with which I am already proficient, would prove to be a great tool to incorporate. Also, since we have only covered one other topic, image analysis, aside from spatial analysis, I feel like I should wait until I have a better idea of what the other topics and tools are in which I could dabble before I solidify an idea.

 

About the DCSI Logo

Robert Feke, Portrait of James Bowdoin II, 1748, oil on canvas, Bequest of Mrs. Sarah Bowdoin Dearborn, 1826.8, Collection of the Bowdoin College Museum of Art

Robert Feke, Portrait of James Bowdoin II, 1748, oil on canvas, Bequest of Mrs. Sarah Bowdoin Dearborn, 1826.8, Collection of the Bowdoin College Museum of Art.

debates in the dh box

Debates in the Digital Humanities. Edited by Matthew K. Gold 2012.

 

 

 

 

 

 

 

 

James Miller ’14, designer of the Bowdoin Digital and Computational Studies logo (above right), recently sat down with Professors Crystal Hall and Jack Gieseking to discuss the process in producing this piece:

In reflection, after making the image, I had embedded a lot more meaning than I had intended… Using the vector filter in Illustrator produced a contour like map that reminded me of some of the GIS work that I had examined and learned about over the summer. The large pixels I read as a bit of skepticism that I was feeling at the time motivated by readings. Are these computational methods causing a ‘resolution loss’ of the interpretive and nuanced views valued in the humanities? Ultimately, I am glad to contribute my logo to DCSI because I believe that these debates are a part of the initiative rather than criticisms of it. I would like to think of the logo in terms of visualizing the careful orchestration of disciplines necessary to creating a field that is sensitive to both the digital and the humanities in intelligent ways. Or, if that seems a little wishful, the Feke portrait is really nice to look at and it gives me an excuse to play with paintings in the Bowdoin collection.”

The members of the Bowdoin Digital and Computational Studies Initiative are grateful to James for allowing his image to be used and modified to make the current logo.

James Miller, Class of 2014, received a Gibbons Summer Research Grant to assist Professors Chown and Fletcher, and another Gibbons award recipient, Evan Hoyt, with the preparations for the Gateway to the Digital Humanities Course being offered in Fall 2013. James designed the logo for the course wiki, which became the model for the current logo for the Digital and Computational Studies Initiative (DCSI). James admits that he was inspired, at least in part, by the cover of Debates in the Digital Humanities essay collection edited by Matt Gold.

Lecture: Galileo, Poetry, and Digital Studies

Lecture: Galileo, Poetry, and Digital Studies

Lecture: Galileo, Poetry, and Digital Studies

  • 10/24/2013
  • 4:30 PM – 5:30 PM
  • Location: Visual Arts Center, Beam Classroom

Crystal Hall, Postdoctoral Fellow with the new Digital and Computational Studies Initiative, discusses how computer-aided research can reveal the ways Galileo Galilei’s philosophical ideas and scientific methods were influenced by the best-selling poetry of his age.

The Salem Witch Trial Archives and Strategies in Digital Humanities

The Salem Witch Trial Archives and Strategies in Digital Humanities

The Salem Witch Trial Archives and Strategies in Digital Humanities

  • 10/10/2013 | 4:30 PM – 6:30 PM
  • Location: Visual Arts Center, Beam Classroom
  • Event Type: Lecture
  • – Open to the Bowdoin Community –

Ben Ray, Professor of Religious Studies at University of Virginia, will give a talk for Bowdoin faculty regarding his Salem Witch Trials project – how it got started and why he turned to digital methods. Chats with interested faculty on project ideas for teaching & research will follow the talk.

Gateways and Digital and Computational Studies at Bowdoin College

Since Pamela Fletcher (co-director of DCSI), Jack Gieseking (New Media and Data Visualization Specialist), and I have offices in the Visual Arts Center, I immediately thought that the Class of 1875 columns (top right) would be an appropriate image for this first blog post about arriving at Bowdoin, but I have  since learned that Bowdoin has no fewer than 3 other sets of gates on campus, and so my image needed to be a collage (something for which the Gateway students will be learning to write code later this semester). Many thanks to Jennifer S. Edwards for supplying images of the other three memorial gates on campus: Franklin Clement & Ella Maria Robinson Gateway, 1923 (top left), Alpheus Spring Packard Gateway, 1940 (bottom left), and Warren Eastman Robinson Gateway, 1920 (bottom right).

Many thanks to Jennifer S. Edwards for supplying images of the other three memorial gates on campus: Franklin Clement & Ella Maria Robinson Gateway, 1923 (top left), Alpheus Spring Packard Gateway, 1940 (bottom left), and Warren Eastman Robinson Gateway, 1920 (bottom right).

This semester marked the first offering of a “Gateway” course in the Digital and Computational Studies Initiative, INTD 2041: Gateway to the Digital Humanities. As a new faculty member at Bowdoin, the title of the course gave me pause: what is a gateway and how does it differ from other points of entry? Also, the course is my own professional gateway to Bowdoin and to exploring a new area of my research in digital humanities. Since Pamela Fletcher (co-director of DCSI), Jack Gieseking (New Media and Data Visualization Specialist), and I have offices in the Visual Arts Center, I immediately thought that the Class of 1875 columns (top right) would be an appropriate image for this first blog post about arriving at Bowdoin, but I have since learned that Bowdoin has no fewer than 3 other sets of gates on campus, and so my image  (left) needed to be a collage (something for which the Gateway students will be learning to write code later this semester).

My doctoral training in Italian literature immediately suggested Dante’s famous gates to the Inferno as an archetypal model for this kind of opening:

Per me si va ne la città dolente,
per me si va ne l’etterno dolore,
per me si va tra la perduta gente.Giustizia mosse il mio alto fattore:
fecemi la divina podestate,
la somma sapienza e ‘l primo amore.

Dinanzi a me non fuor cose create
se non etterne, e io etterno duro.
Lasciate ogne speranza, voi ch’intrate.
Inferno III.1-9, ed. Durling & Martinez, 1996

Through me the way is to the city dolent;
Through me the way is to eternal dole;
Through me the way among the people lost.Justice incited my sublime Creator;
Created me divine Omnipotence,
The highest Wisdom and the primal Love.

Before me there were no created things,
Only eterne, and I eternal last.
All hope abandon, ye who enter in!
– translation by Henry Wadsworth Longfellow, Class of 1825

My early experiences at Bowdoin certainly have nothing in common with the city of woe, eternal pain, or residing among the lost that Dante’s Gates of Hell announce! (Given the troublesome relationship many of us have with computers, I do feel obliged to acknowledge a final project in my 2009 Dante seminar in which a student revised Hell for a digital world so that a judging figure of Bill Gates evaluated the technical skill of all newly arrived souls and sent the digital sinners to various eternal tech support lines.) In spite of metaphorical limitations, I do think Dante’s structure can serve as a useful means for understanding the multiple gateways present: for me, this seminar is a point of entry to academic life at Bowdoin; for DCSI this is an opportunity to explore curriculum possibilities; for many of the students this is their first exposure to a new discipline.

In Canto III of the Inferno, the gate itself is doing many things. It is not a passive threshold, but rather, it acts. It speaks to the pilgrim, it touches the emotional quick of the soul, and it verbally maps the entire structure of Dante’s vision of the afterlife.  This is not a window that allows a protected glimpse of content or external possibility. This gateway is part of the architecture of the imagined space of the afterlife and a fundamental structure in the poem: it immediately enunciates and performs the trinity with the repetition of “per me” and its triple terzina length; Justice, the emotional motivator of Dante’s poem is given the emphatic position of first word in the middle terzina; and the guiding principles of the organization of Hell, Purgatory, and Paradise are the triplet of power, wisdom, and love. The final tercet tells the fascinating autobiography of the Gates of Hell. This Gateway was not necessary until a dramatic shift in creation (here, Original Sin) established the categories of immortal and mortal. The gate tells new arrivals to abandon all hope, but careful readers know that this admonition does not apply to Dante, and should not apply to them if they follow Dante’s lead by observing, questioning, and applying what they have learned to their own lives.

So, how is INTD 2041 a gateway? Hopefully the language of my poetic analysis will have already hinted at the powerful ways that the “Gateway to Digital Humanities” course is prompting action in the classroom.  Where Dante’s gate acts out the trinity, Gateway to DH students are asked to act out the practices of humanists. My sense is that the motivation for taking the course comes from a potent blend of curiosity and the sense that the material can be immediately applied to life and work outside the classroom. The discussions and activities are organized by categories of materials that humanities disciplines investigate: images, spaces, texts, and networks.  The course itself is necessary because of the unfolding changes that the digital world is bringing to higher education and society at large.

In a way that I feel is indicative of the broader field of digital and computational studies, this seminar on digital humanities obliges participants to be rigorous intellectuals who are also creators. I am delighted to have learned recently that there is already a precedent for the Bowdoin community and Dante’s famous gates, a 2009 exhibit on Auguste Rodin in which preparatory pieces for his sculpture “Gates of Hell” featured prominently.  I look forward to seeing what we are able to create this year as the DCSI begins.

 

Rodin, “Gates of Hell,” Zurich, Kunsthaus.

Rodin, “Gates of Hell,” Zurich, Kunsthaus.