Methodology

The digital aspect of this project involves creating a visual network of communication using a combination of Gephi and R script.  The goal of this network is to visualize how information moves throughout the community.  I will visualize each character in the novel as an individual node and consider how they interact with one another.  Revealing this network will answer a number of important questions about the text.  Who has and shares knowledge, and who receives it?  How does that knowledge move between people?  Who is the object of the most gossip, and where does the gossip end?  Beyond the value of the model itself, the process of creating it raises a multitude of questions about the text.  We might consider how Austen structures the text around dialogue and narration and when the narrator interrupts sequences of dialogue.  Further, attempting to divide the text into units raises important questions about chapter divisions: Is there only one conversation per chapter? Do conversations ever last multiple chapters?  Why does Austen structure the text in this way, and what can we learn from it?  These are just a few of the questions that digitalizing Austen’s communities can raise.

My networks will track dialogue between characters.  However, digitalizing dialogue proved to be more complicated than it initially appears.  In considering creating a network of dialogue, one might expect to portray the characters as individual nodes with edges connecting them to show interactions.  However, how do we determine where the connection starts and where it ends?  Let us start with a basic example.  Character A is having a conversation with Character B.  We might digitize this with lines moving from A to B and back again for each piece of dialogue.  However, we might complicate this image by adding a third person, C, to the conversation.  When A speaks, is he speaking to B or C, or perhaps both?  How do we determine this?  One way may be to assume that the next person to speak is the one to whom the first character is speaking. Another would be to assume that each character is always speaking to all the characters in the group.  Both these methods pose problems in their accuracy.  Further, let us suppose that A, B, and C are gossiping about a fourth person, D.  How do we portray this connection?  Any automatic method of portraying gossip is inherently flawed, but considering these problems also helps us to consider how dialogue works in the text.

In my research, I discovered a project similar to my own entitled Austen Said: Patterns of Diction in Jane Austen’s Major Novels.  This project explores forms of discourse in Austen’s novels, most prominently free indirect discourse.  The researchers used XML markup to identify each passage in all of Austen’s major works by the speaker and the form of discourse (direct discourse, indirect discourse, or free indirect discourse).  The researchers have helpfully provided their marked up data on their website, and I will use this data in my own study of Austen’s novels.  Although Austen Said primarily focuses on free indirect discourse, their XML markup does provide information about who “says” each passage in the book, which I used in automatically generating my graphs.

I decided to digitally visualize the full novel as well as two individual chapters in greater detail.  I outlined the following steps for creating my graphs:

  1. Automatically generate a graph of gossip in the entire novel in which each character speaks to the next character who speaks, using the XML markup from Austen Said to identify speakers and listeners.
  2. Automatically generate a graph of each of the individual chapters of focus using the same method as in step 1.
  3. Automatically generate a graph of gossip in the entire novel in which each character speaks about any character they name in their dialogue. For example, if Emma mentions Mrs. Weston in her dialogue, I draw an edge from Emma to Mrs. Weston.
  4. Replicate step 3 for each individual chapter of focus.
  5. Manually create the graphs generated in step 2.
  6. Manually create the graphs generated in step 4.
  7. Add gender.

The first step of the project was to perform an initial analysis of the data for the whole novel.  This initial phase identified the speaker of each passage using the “who” attribute of the markup.  The next passage’s “who” attribute became the recipient of each passage.  This step ignores the narrator and characters speaking as each other, and it excludes any additional information, such as gender or social class.  This version of the graph is inherently flawed, as it makes a whole set of assumptions about the text, like the idea that each character is addressing the next character who speaks.  However, it provides a basic starting point from which to proceed.  As this step was almost entirely automated, I performed it first on the novel as a whole, and then on the individual chapters.

In order to accomplish this, my advisor, Professor Crystal Hall, provided me with a piece of code in R that identifies the tagged speaker of each passage and uses the following speaker as the recipient of that passage.  I ran this code using XML markup of the text and uploaded the spreadsheet into Gephi.  Then, I identified the characters I did not want to include in the graph.  These included the narrator, the narrator speaking as a character (indicative of free indirect discourse), and characters speaking as each other.  I wanted to see gossip as it occurs between characters, so though we might consider the narrator as a gossip in this novel, I eliminated her from the graph.  Additionally, I didn’t want “Emma as Knightley” to appear as a separate character in the graph, so deleted these unnecessary nodes.  Then, I rearranged the nodes in order to see them all individually.

The next step was to create a graph that shows characters talking about one another.  Again, Professor Crystal Hall provided me with an R script that searches for a list of character names and creates edges from the character speaking to those they speak about.  This script also accounted for time, considering one chapter as a unit of time.  I then developed the list of character names, making sure to include variations like “Jane,” “Miss Fairfax,” and “Jane Fairfax.”  In Gephi, I then combined nodes that were different names for the same character.  I performed this analysis on both the whole novel and each individual chapter.  I then performed each of the steps above manually for the individual chapters, manually creating my own Excel spreadsheet of my interpretations of the chapters.  Finally, I retroactively added gender as an attribute to all of my graphs so that I could color the nodes accordingly.