Text as Data?

The digital humanities has expanded the possibilities of textual and literary studies. For this first stage of the project, I am interested in producing a digital scholarly edition, which involves manually annotating a text following the TEI standard, a practice for representing texts digitally; doing so will essentially turn the text into data. This pursuit will undoubtedly spark other projects involving textual analysis, data visualization, mapping, and exhibition. Encoding the text for a computer to read is only the first step. My hope is that after encoding the text for Gogol’s “The Nose,” I can add other works and build toward a more comprehensive analysis of Gogol’s oeuvre. Although some of these new digital tools and methods from the computational humanities are new and may seem complicated, they allow us to process large corpora easily and generate new kinds of knowledge. Turning text into data opens it up to new perspectives, approaches, and interpretations, which is a vital part of humanistic inquiry.

For more information about these possibilities, the introduction to the Digital Humanities Quarterly special issue on “The Literary” offers a brief explanation on how digital tools and methodologies have altered the study of literature and literary practices. My hope would be that other scholars can benefit from having the encoded text available so that they can further the study of this important author and Slavic Studies with digital tools and methodologies more broadly.

Полное собрание сочинений Николая Васильевича Гоголя, 1880

What is a Digital Scholarly Edition (DSE)?

There are plenty of digitized copies of Gogol’s complete works available in the original and in translation online (see previous post for the link to an English translation). A digitized edition is not a digital scholarly edition. Digital editions take advantage of the potential offered by digital modalities. Therefore, an analog text that has been digitized is just a digital surrogate for the original and functions the same way as the material text. A digital edition for a textual work has been encoded so that a computer can “read” it allowing further analyses to be performed on the structure, word choice, or any number of other elements within the text. 

Digital editions offer functionality based on their purpose: they usually entail some degree of user interactivity, often with expanded scholarly annotations and commentary, comparative views between similar texts or different versions of the same text, and searching capabilities. Digital editions are not text-exclusive, either, and may include high-quality digitized images of art work and pages from historic manuscripts with transcriptions of the text and any marginalia, as well as any necessary historical context. However, the most important aspect of a DSE is the critical engagement it allows through additional commentary, criticism, or context that transforms the text into a scholarly resource that also details the creation process and metadata for the project. Patrick Sahle’s chapter in Digital Scholarly Editing provides a thorough overview of how to define a DSE.

What you’ll notice when you peruse these digital scholarly editions and others is how different they can look and perform. The digital medium of these projects allow for a significant amount of customization depending on their intended purpose and audience. My choices for this Gogol project are not the only choices available. I will do my best to refer to alternative options and strategies when it makes sense to do so!

And while by no means emblematic of the entirety of digital scholarly editions, even the small list above raises questions of representation. Reflecting the kinds of texts and resources privileged over centuries for publishing and archiving, it is not a stretch to say that there is a disproportionate amount of space dedicated to white, European writers and historical figures, a problem that the digital humanities must be aware of or otherwise risk replicating inequities of the past. In upcoming posts, I’ll be discussing this issue, as well as issues of copyright, sustainability, accessibility, funding, and a multitude of technical considerations surrounding the theoretical and practical framework of digital scholarly editions and the digital humanities more generally.

Leave a comment