Text as Data?

The digital humanities has expanded the possibilities of textual and literary studies. For this first stage of the project, I am interested in producing a digital scholarly edition, which involves manually annotating a text following the TEI standard, a practice for representing texts digitally; doing so will essentially turn the text into data. This pursuit will undoubtedly spark other projects involving textual analysis, data visualization, mapping, and exhibition. Encoding the text for a computer to read is only the first step. My hope is that after encoding the text for Gogol’s “The Nose,” I can add other works and build toward a more comprehensive analysis of Gogol’s oeuvre. Although some of these new digital tools and methods from the computational humanities are new and may seem complicated, they allow us to process large corpora easily and generate new kinds of knowledge. Turning text into data opens it up to new perspectives, approaches, and interpretations, which is a vital part of humanistic inquiry.

For more information about these possibilities, the introduction to the Digital Humanities Quarterly special issue on “The Literary” offers a brief explanation on how digital tools and methodologies have altered the study of literature and literary practices. My hope would be that other scholars can benefit from having the encoded text available so that they can further the study of this important author and Slavic Studies with digital tools and methodologies more broadly.

Полное собрание сочинений Николая Васильевича Гоголя, 1880

What is a Digital Scholarly Edition (DSE)?

There are plenty of digitized copies of Gogol’s complete works available in the original and in translation online (see previous post for the link to an English translation). A digitized edition is not a digital scholarly edition. Digital editions take advantage of the potential offered by digital modalities. Therefore, an analog text that has been digitized is just a digital surrogate for the original and functions the same way as the material text. A digital edition for a textual work has been encoded so that a computer can “read” it allowing further analyses to be performed on the structure, word choice, or any number of other elements within the text. 

Digital editions offer functionality based on their purpose: they usually entail some degree of user interactivity, often with expanded scholarly annotations and commentary, comparative views between similar texts or different versions of the same text, and searching capabilities. Digital editions are not text-exclusive, either, and may include high-quality digitized images of art work and pages from historic manuscripts with transcriptions of the text and any marginalia, as well as any necessary historical context. However, the most important aspect of a DSE is the critical engagement it allows through additional commentary, criticism, or context that transforms the text into a scholarly resource that also details the creation process and metadata for the project. Patrick Sahle’s chapter in Digital Scholarly Editing provides a thorough overview of how to define a DSE.

What you’ll notice when you peruse these digital scholarly editions and others is how different they can look and perform. The digital medium of these projects allow for a significant amount of customization depending on their intended purpose and audience. My choices for this Gogol project are not the only choices available. I will do my best to refer to alternative options and strategies when it makes sense to do so!

And while by no means emblematic of the entirety of digital scholarly editions, even the small list above raises questions of representation. Reflecting the kinds of texts and resources privileged over centuries for publishing and archiving, it is not a stretch to say that there is a disproportionate amount of space dedicated to white, European writers and historical figures, a problem that the digital humanities must be aware of or otherwise risk replicating inequities of the past. In upcoming posts, I’ll be discussing this issue, as well as issues of copyright, sustainability, accessibility, funding, and a multitude of technical considerations surrounding the theoretical and practical framework of digital scholarly editions and the digital humanities more generally.

The Text: Gogol’s “The Nose”

Вдруг он стал как вкопанный у дверей одного дома; в глазах его произошло явление неизъяснимое: перед подъездом остановилась карета; дверцы отворились; выпрыгнул, согнувшись, господин в мундире и побежал вверх по лестнице. Каков же был ужас и вместе изумление Ковалева, когда он узнал, что это был собственный его нос!

– Nikolai Gogol, “The Nose”

“The Nose” is a fantastic story. The first time I read it, I could not believe what I was reading – it was absolutely absurd. Published in 1836, in a journal owned by Aleksandr Pushkin no less, this satirical story begins when a barber named Ivan Yakovlevich finds a nose in his wife’s freshly baked bread. He recognizes it as the nose of Collegiate Assessor Kovalyov, one of his regular customers. Yakovlevich dumps the nose into the river so as not to be found with it. Meanwhile, Kovalyov, a contemptible bureaucrat, wakes up to discover his nose has gone missing. A man so concerned with appearances and status, Kovalyov is mortified. Although he attempts to hide his new deformity, he soon becomes the center of Petersburg gossip when he runs into his nose galavanting around St. Petersburg. What is worse for Kovalyov is that his nose has apparently become a higher ranking bureaucrat than he himself! Kovalyov does eventually get his nose back and any lessons he may have learned from the strange occurrence are quickly forgotten. The various interpretations of Gogol’s story are tangential to this project (Is is about repressed homosexuality? a criticism of Russian social hierarchy? absurdity for absurdity’s sake?), I chose this story purely out of love, because it’s one of my favorite short stories. So if you haven’t read it, please read it here.

Portrait of Gogol, 1841, by Otto Friedrich Theodor von Möller.

An important note on language:

“The Nose” was originally written in Russian, and this digital edition will utilize Russian texts. I acknowledge that Nikolai Gogol is part of the Russian literary canon, but this is where things get interesting. Gogol is Ukrainian. He was born in the village of Velyki Sorochyntsi, Ukraine and therefore, he also belongs to Ukrainian literature ( and I would be remiss if I didn’t point that out). In fact, part of the reason why I chose Gogol for this project is because he uses the Russian AND Ukrainian languages in both his literary works and personal life. For example, as much as Gogol is known for his Petersburg tales (“The Nose” being one of them), his letters and notes are often written in Ukrainian, some of his characters speak Ukrainian, and many of his stories are about Ukraine. In the future, as I expand to other works and personal writings, this will become more apparent.

Of course, in the background of this whole project is the long shadow cast by a history Russification in Ukraine and issues of language and identity. These questions of language and identify are made all the more pertinent given the contentious relationship between the two countries and the recent 2014 invasion of Crimea and 2022 war in Ukraine. I refer you to this poignant essay, written by Ukrainian-American poet Ilya Kaminsky, to illustrate some of the complexities of language, which have origins from Gogol’s time, during which the Ukrainian language was banned in schools. In this context, writing in Ukrainian takes on a subversive quality that also resonates with Gogol’s bitingly satirical depictions of Russian bureaucracy and society. I’m not sure how this will play out as this project develops, but I’m curious.

Welcome!

Hello! My name is Haley and I’m a long-time academic scholar turned librarian. I’m interested in the intersections of text and digital technology, and how this thing we’re calling the digital humanities can enrich and expand our understanding of what is possible in terms of humanities research. To that end, I’m going to create a digital edition. I’m going learn how to mark up a text so that a computer can read and analyze it. I’m going to turn a static text into something a bit more dynamic and interactive. And since I can never really leave Slavic Studies behind, I am going to create a digital edition of Nikolai Gogol’s short stories, starting with “The Nose,” and I’m going to use this project to talk about the work of the digital humanities and digital scholarship.

My introduction into DH came while I was a doctoral student in Slavic Languages and Literatures. As much as I love the subject matter, I found myself quickly disillusioned with the field. If academia can be slow to change, Slavic Studies as a whole is notoriously conservative and unyielding. In some ways, doing the work of Slavic Studies feels like being stuck on a rickety old ship embarking ever so slowly on a path already charted. You don’t know how to get off this boat, perhaps even have some vague ideas about hijacking the whole thing, but the ship is so heavy and hard to re-direct. You find yourself channeling Mayakovsky when he and his contemporaries first penned the Futurist Manifesto all those years ago. Maybe there is no steering the ship. You need to jump, but the waters are murky. And you don’t know how to swim—no one taught you how! Meanwhile, you’re watching other graduate students engage with some interesting interdisciplinary research—stuff that’s relevant and meaningful to our contemporary experience—while you are working with outdated tools and the burden of tradition.

This blog and the project for which it was created is my attempt to “swim,” to grapple with not only some interesting tools and approaches, but also with my intellectual and professional future. Inspired by other DH projects such as Digital Dostoevsky, I am going to create a digital edition and document the process here as I go along. Open access, collaboration, and public engagement are core values of the digital humanities, and I will endeavor to make the process of creating this digital edition as transparent and accessible as possible. To that end, I’ll be including helpful resources, tips, and explanatory materials that informed my research and creative process with every post. I hope this might be a place that supports learning and innovative thinking, as well as a forum to connect and share ideas.

We need new perspectives and modes of analysis to breathe life into our approach to Slavic literature and humanistic inquiry in general. These kinds of tools and methods are not meant to displace traditional modes of inquiry and research, but to complement and challenge tired assumptions of what it means to extract meaning from a text. The digital humanities may be able to help change how we conduct and support humanities research, BUT there’s still a lot of work to be done. While I cannot yet speculate on exactly how this particular project will turn out or what research questions will be answered, I do hope to learn a lot about how DH and Slavic Studies intersect. I also hope that this may serve as a useful guide for someone out there (maybe even you) looking to start their own DH project.