The “Victorian Ghost Stories” project attempts to answer the following research question: In what ways did female authors use language to build suspense in the Victorian ghost story? Our research topic involved the fields of literature, history, linguistics, and digital research methods. Our approach required textual analysis, XML language tools, and the WordNet lexical database.

For our readings, we selected five short stories from four different authors: Walnut-Tree House and The Old House in Vauxhall Walk by Charlotte Riddell; The Old Nurse’s Story by Elizabeth Gaskell; The Shadow in the Corner by Mary Elizabeth Braddon; and John Charrington’s Wedding by Edith Nesbitt. This range generated enough data to compare contemporary texts with different writing styles without sacrificing our ability to perform in depth textual analysis on each individual story. Our model is applicable to larger samples of ghost stories, however, should future digital humanists desire to expound upon our research.

Markup and Analysis

After deciding upon our research question and selecting stories, we next performed a close read and analysis of each text to determine our methodology for markup. We decided that the focus of our exploration should be to find “scary words” and rate them according to scariness, part of speech, and synset, so that we could determine not only what scary language was used, but also in what ways its scariness and frequency affected the stories. We were able to model the structure of the stories using Relax NG, a schema language that limits XML markup to a certain heirarchy of elements.

Tracking "Scariness"

We compiled a list of 286 unique scary words that we then tagged in each text using XML markup. The scariness rating was determined by @scale attributes ranging from 0-3, with the additional rating of “i” for intensifiers like “very” or “quite.” Scary words with a scale of 0 had the potential to ignite fear, but were not so used in context, like the words “black” or “cold.” These words could receive higher ratings when appropriate. Scary words with a scale of 3 were the scariest and therefore always scary, like “horror” or “deadly.” This was the most subjective part of the project, though we gave ourselves parameters in that we drew from an agreed upon list.

Part of speech was measured with attribute @pos, with the available options of verb, noun, adjective, adverb, or modifier. We were curious about the way in which scariness was communicated, and this allowed us to see it at the grammar level. Is scariness more descriptive, such that the most frequent type of scary words are adjectives? Or is it more action based, such that they are verbs? This data influenced our conclusion, adding another level of specificity to our research question.

Interpreting our Data

Using an XML database created with eXistDB, we were able to extract data from all five stories, which we used to create tables and graphs. Our data has some flaws: for example, because different people tagged different stories, best judgement prevailed. Although we don't love that those variations exist, the structure of a semester long course necessitates some compromise. We acknowledge consistently in our conclusions and analysis that these issues exist, and hope this data functions more as a proof of concept than a conclusive analysis of the genre.


As one of our team members had a linguistics requirement to fill, we also explored our scary word elements using the WordNet lexical database. WordNet is a hierarchical organization of units of meaning, called synsets, which are represented in texts by words. A detailed explanation of how synsets can be used to explore a text can be found here. Our project in particular focused on task number three, “Exploring the richness of the expression of spookiness.” We concentrated the task on the “haunting” plot section of each ghost story, expecting that this specific section would incorporate the most significant amount of scary vocabulary. After tagging each scary word element in the haunting sections with its appropriate synset in context, we were able to examine the language that different writers and texts employ to represent scariness. A synset is best defined as a cluster of synonyms or meaning, We performed this linguistic investigation by dividing the number of distinct synsets by the number of scary words with a non-zero scariness rating. The resulting value for each text provides a ratio, for which a higher value indicates that scariness is expressed in a wider variety of morphological ways, whereas a lower value indicates less variety in vocabulary.

[an error occurred while processing this directive]

Creative Commons License Victorian Ghost Stories by Abigail Drabick, David Wade, Gabi Keane, Kaylen Sanders is licensed under Creative Commons Attribution-NonCommericial 3.0 United States