VicoGlossia: Annotatable and Commentable Library -- (a proof of concept study: Early Soviet Philological Culture)



How can we study, present and teach complex cultural phenomena such as the Russian philological culture of the 1920s? To achieve this goal, we elaborated a knowledge representation that facilitates scientific collaboration, enables distant reading, improves the navigation of scholarly literature, links classical texts to rich international scholarship, and provides a basis of effective visualization. Digital Humanities offer an ideal framework for the intense human-computer collaboration required to carry out such a project. We focus on the network of relations both within and between three key communities of the early Soviet philological milieu – the Formalists, the Marrists and the Bakhtinists – approaching them through the optics of two major philological romans à clef of the period. To this end, we (1) prepared a collection of primary texts; (2) built a repository of secondary literature; (3) using this research literature, enriched primary texts with both general and ad locum annotations; (4) adapted the nano-publications method as a comprehensive approach for representing this scholarly knowledge in the Semantic Web. We make use of the quantitative methods toolkit of the VicoGlossia system, which was developed as part of an international and inter-institutional collaboration.

Questo contributo vuole rispondere alla seguente domanda di ricerca: come presentare e insegnare i complessi fenomeni culturali della cultura filologica russa degli anni '20 del secolo scorso? Per raggiungere tale obiettivo, abbiamo realizzato una rappresentazione della conoscenza di dominio che consenta la collaborazione scientifica, permetta il distant reading, migliori l'esperienza dell'utente nell'esplorazione della letteratura, assicuri l'interconnessione tra letteratura classica e quella accademica e, infine, fornisca una visualizzazione efficace. Il framework offerto dalle Digital Humanities, i.e. la collaborazione tra umanisti e informatici, ha permesso la realizzazione di questo progetto. Il focus è sulle relazioni tra comunità del milieu filologico sovietico. Pertanto, (1) è stato realizzato un corpus di testi, (2) creato un repository di letteratura secondaria, (3) arricchita la letteratura classica con con annotazioni, (4) adattata la metodologia delle nanopublications alle esigenze della rappesentazione della conoscenza accademica.


How to study, to present, to teach complex cultural phenomena such as the Russian philological culture of the 1920s? To a large extent, early Soviet intellectual scene owes its fascination to the intense and fruitful cross-fertilization of humanities, literature, and arts, often with fuzzy borders between them.

Such a task requires an approach that goes beyond the study of individual authors, texts motifs, connections, or communities, in order to take into account the multiplicity and complexity of the relations between the texts and between the actors, which acted as nodes in a wide network, and to make evident the variety and density of the context. We need to examine systematically not isolated relations, but their interference and juxtaposition, to follow their evolution, and to assess their intensity and their mutual impact. To achieve this goal, we have to elaborate a proper knowledge representation that facilitates scientific collaboration, enables distant reading, improves navigation in the scholarly literature, assures the linking between classical texts and the large international scholarship dealing with them, and provides a basis of effective visualization. Digital Humanities offer an ideal framework for an intense collaboration of Humanist and Computer knowledge required for the realization of such a project.

We focus on the network of relations both within and between three key communities of the philological milieu of this crucial period in the Russian intellectual history – the Formalists, the Marrists and the Bakhtinists – approaching them through the optics of two major philological romans à clef of the period... For this purpose we (1) prepare a collection of primary texts relevant for a better understanding of these novels as well as those referred to by them; (2) build an as complete as possible repository of secondary literature focused on those primary texts and interconnected with concepts and objects in them; (3) using this research literature, enrich primary texts with both general and ad locum annotations, in both human- and machine-readable form; (4) adapt the nanopublications method (originally developed in bioinformatics) as a comprehensive approach for representing this scholarly knowledge on the Semantic Web.

We make use of the toolkit of quantitative methods of the system VicoGlossia developed in an international and inter-institutional collaboration (EPFL, UNIL, Leiden University, the Higher School of Economics, and the Yandex School of Data Analysis (both Moscow)). Based on Semantic Web technologies, VicoGlossia complements traditional methods of literary analysis and intellectual history with modern tools of quantitative research and visualization.

Current state of research in the field

Early Soviet linguistics and humanities as an object of recent research

The theoretical and practical heritage of the period is a long-lasting source of inspiration and a thrilling object of research. An important part of this heritage is philological in that – today partly obsolete – sense that embraced linguistic, literary, historical, philosophical, and psychological studies. The novels we use as testimonies of the Soviet intellectual scene of the 1920s – Veniamin Kaverin’s The Troublemaker, or Evenings at Vasilievsky Island and Konstantin Vaginov’s The Goat Song (both 1928) – show in a performative way the importance of taking into account the relations between texts, persons and communities of the period.

There is an extensive literature where have been revealed the participants of circles as prototypes of the characters in Vaginov’s ( ; ; ; ; ; ; ) and Kaverin’s ( ; ; ; ; ; ; ) novels, although many implicit quotations, hidden references to various discussions and theories spread out through their texts still remained undetected. Other relations (and their interplay) are to be discovered in the course of our study.

The scholarly communities described in the novels in a fictionalized form – the Formalists, the Bakhtin Circle, linguists (E. Polivanov, L. Jakubinskij and implicitly N. Marr) – were of very different types. The Formalists acquired very early a self-identity as a community (in spite of big variety of positions and itineraries), and initiated their own historiography. Later several major studies showed the importance of this movement per se ( ; ; ; ; ; ; ), as well as for the genesis of the European and American structuralism ( , ), its international roots ( ; ; ; ), its philosophical presuppositions ( ; , esp. ), its appraisal , particular weight of some key concepts, like ostranenie ( ; ), and its interconnections with other movements .

Much more complex is the history of what only later was baptized the Bakhtin Circle. The oeuvre of M. Bakhtin is now well edited and accompanied by detailed commentaries by S. Bocharov, S. Gogotishvili, I. Popova, V. Ljapunov, M. Makhlin ( ; ) and offers the basis for any research on the author and on his circle. Major contributions into the Bakhtin studies have been done since decades ( ; , ; ; ; ; ; ; ; ; ). Some important studies were devoted to certain members of this nebulous community – P. Medvedev, L. Pumpianskij, V. Voloshinov, I. Sollertinskij, M. Yudina, M. Kagan – who remained for a while in the shadow of the master ( ; ; ; , etc.). We also build upon several studies that treat the Bakhtin circle as an intellectual unity ( ; ; ; ; ; ; , ; ;) and upon research that retracеs its interconnection with the whole Russian intellectual context of the period ( ; ; ; ; ), including specifically with linguistic ideas ( ; ; ; ; ; ; ; ; ; ) and with the Russian Formalists’ theory ( ; ; ; ; ; ; ). Digitizing all Bakhtin’s texts and packaging the complete bibliography of Bakhtin studies is the mission of the Analytical Database project of the University of Sheffield’s Bakhtin Centre.

As to the history of linguistics, by contrast to periods with one clearly dominant linguistic paradigm (linguistique cartésienne in France in the 2nd half of the 17th century or Junggrammatiker in Germany in the 1870–80s), there were several coexisting currents in Russia in the 1920s, interacting with each other, and further human and social sciences as well as arts, under the banner of the search of Marxist linguistics, and forming a captivating intellectual polyphony (drastically reduced soon after). Some important contributions to the history of linguistic ideas of this period have been made during the last decades ( ; ; , etc.; Cahiers de l’ILSL on R. Jakobson, on N. Marr, on the period 1920–30s and beyond). In particular, the light has been shed on the decisive role of E. Polivanov in the genesis of phonology . On the linguistic context of the Bakhtin circle see several works of P. Sériot, then , ; ; ; ; .

Several studies concerning the period were carried out recently, with a clear tendency to overcome an individualistic (or nominalistic) approach to separate authors and/or works and to show the contextual density and multiplicity of relations between them, with a special attention to relations between both individuals and various creative communities which are thus presented not as autarkic unities, but as nodes in a wide tissue, with blurred and permeable boundaries between disciplines, corporations, and circles (cf. new approaches to intellectual history, ). Some studies stressed the interaction between different thinkers and between various arts ( ; ; ; ; ; ).

Nevertheless, until now it remains still insufficiently explored how linguistic debates of the period could profit from the intense circulation of ideas between linguistics, on the one hand, and philosophy, other human sciences, and literature, on the other hand. The development of linguistic ideas is still often considered as endogenous. The boundaries of linguistics as such were, however, rather blurred. The object of linguistics was not clearly delimited, it was not yet a language in itself and for itself; linguists cooperated widely with geographers, psychologists, and literary critics, e.g. P. Bogatyrev, L. Vygotsky, P. Savitsky, O. Frejdenberg, V. Vernadskij ( , , ).

In spite of growing awareness of the importance of structured corpora (or digital text collections), there are only few attempts of digitization of collections related to the period of our interest: see the valuable collection Open Commons of Phenomenology, directed by P. Flack and hosted by the Publishing House “sdvig” (Geneva-Lausanne), which represents the literature on Phenomenologists as well as Formalists and Structuralists related to them.

New steps in the exploration and understanding of the richness and complexity of the intellectual scene of the period require new ways of research and knowledge presentation, and they can be obtained only through close collaboration with digital humanists and through the use of their tools.

Digital collections, annotation, and intertextual linking

The meaning of most texts is partly constructed by explicit and implicit references to other texts (intertextuality) and to tacit knowledge common to authors and their first readers. In order to fully understand a text, a reader has to be aware of and understand these references, which means that authors and readers must share knowledge of a certain canon of texts and of cultural concepts and practices. This is especially true when considering texts that were produced in a close-knit historical milieu like that of Soviet Russia in the 1920s described above. These texts contain numerous implicit references to other texts or cultural realities that were self-evident for contemporaries but may not even be recognizable for today’s readers. Mass digitization of textual heritage has made access to sources easier than ever before, but their context and intertextuality remain the domain of specialized researchers. Despite their importance for scholarly research, intertextual relations are generally not documented systematically. The available documentation is often spotty and scattered over the scholarly literature, often hidden in footnotes, and generally hard to find. A sustainable preservation of insights gathered by research is generally still not ensured and the time-consuming and challenging work of reconstructing the references has to be done over and over again. In spite of large-scale digitization of texts, the methods for organizing, analyzing, evaluating, annotating, and otherwise processing findings about these sources have not kept up with this development.

Digital scholarly editions (like their paper predecessors) typically only contain a single work (e.g., Chaucer’s Canterbury Tales , Flaubert’s Bouvard et Pécuchet ), an author’s œuvre, (e.g. Dante Gabriel Rossetti, R. Tagore, L. Tolstoy, F. Nietzsche, V. Woolf, L. Wittgenstein), or some closely delimited corpus. They are also scattered over the Web, constructed according to different principles, and permit no interconnections – so, like paper books, they present œuvres as isolated textual monads and not at as nodes in a network of World Literature ( ; , , ). Intertextuality in digital editions is usually documented in the same way as in printed editions: by free-text comments. Consequently, scholars’ insights about intertextual relations cannot be processed – i.e. searched, linked, analyzed, transformed, visualized – automatically. In spite of continuing research in information extraction, automatic or semantic textual annotation ( ; ), there are only few projects that have attempted to go further. In the Orlando Project intertextual relations were marked up in the text and annotated with one of eleven categories, such as Quotation or Interpretation , but all other information was only given in natural language. For the edition of the writings of Wittgenstein an ontology was developed with one of the goals being the documentation of the intertextual relations within and between his works, and the secondary literature . For our research, the most pertinent digital humanities project is probably Sharing Ancient Wisdoms (SAWS) which analyzed traditions of the wisdom literature (in particular the reception of classical Greek wisdoms in Arabic). SAWS used the approach of embedding RDF into TEI documents and developed an ontology to formally describe various types of relationships between text fragments (e.g., isLongerVersionOf or isVariantTranslationOf).

Most work on intertextuality in the digital humanities is primarily concerned with the automatic detection of intertextuality, not with its documentation; examples of tools are TRACER , Tesserae ( ; ; ), Janus, and Phœbus ). However, such tools can only detect intertextuality that is manifest on the surface level.

The difficult accessibility of research results on intertextuality may also be considered a problem of publishing research results in a way that allows for discovery. In natural sciences there is ongoing research aiming to develop alternatives to the traditional publication model, which is increasingly seen as inadequate by the scientific community. Nanopublications have been proposed as a common framework for describing scientific statements together with their context (e.g., original publication, authors, organisms involved), so that central scientific results can be unambiguously referenced and connected to their authors, and to support discovery and automatic aggregation and analysis. In the humanities, nothing comparable exists until now; however, the potential of nanopublications in the humanities has already been demonstrated by several researchers ( ; ; , ; ; ).

Such tools do not abolish, but efficiently complement and digitally reload the century-old tradition of manual annotation: marginalia ( ), footnotes ( ) and particularly comments ( ; ; ; ; ; ; ). The reflection on the articulation of the philological and hermeneutical tradition with the new digital culture begins to develop ( ; ). Distant and close reading now are considered not as antagonists, but as necessary complementary aspects of the same research ( , , ). According to D. Apollon, however, recent digital technologies for encoding texts are still enigmatic black boxes for most researchers working in the field of philology and textual criticism ( , Introduction).

To summarise, further steps in understanding the phenomenon of the early Soviet philological culture can only be done in a digitally-assisted knowledge representation allowing to retrace not only multiple relations, but also their interference, evolution, and intensity. These questions can only be answered by making use of digital methods that enable scientific collaboration, facilitate navigation in scientific literature, assure the linking between classical texts and large scholarship focused on them, and provide efficient visualization. Such task is rather new in humanities and, as far as we know, it has been never applied to the Russian domain.

Exposition of research

Problem addressed and the goal of the research

As we argued above, the analysis of the phenomenon of Russian philological culture of the 1920s suffers from the same drawback that is typical for historical research in the humanities as a whole – it remains mostly focused on individual authors, works, or relations. At the same time further progress in understanding Russian intellectual revolution requires an approach that goes beyond the study of individual authors, texts motifs, connections, or communities, in order to take into account the multiplicity and complexity of the relations between the texts and between the actors, which acted as nodes in a wide network, and to make evident the variety and density of the context. We need to examine systematically not isolated relations, but their interference and juxtaposition, to follow their evolution, and to assess their intensity and impact. Such analyses immediately hit the so-called dimensionality curse problem – large studies quickly become unfeasible due to the number of connections to consider and the complexity of the resulting networks of citations, concepts, and ideas.

Our main goal is to elaborate a proper knowledge representation enabling scientific collaboration, to complement close reading with distant reading, to facilitate navigation in the scholarly literature, to assure the linking between classical texts and the large international scholarship focused on them, and to provide effective visualization. We are exploring the network of relations both within and between three key communities of the philological milieu of the period: (i) the Formalists, (ii) Nikolaï Marr and his school, and (iii) Mikhaïl Bakhtin and his circle, approaching them through the optics of two important philological romans à clef of the period, Veniamin Kaverin’s The Troublemaker and Konstantin Vaginov’s The Goat Song.


We achieved the goal outlined above, whilst pursuing the following objectives (partly achieved at the moment):

  1. To prepare a representative corpus of primary texts, relevant for better understanding of these novels; retrace thematic, typological, institutional, private and other influences, borrowings, impacts and other types of transfer; convert them into digital form;

  2. To build a collection of secondary literature focused on primary texts; Structure this collection by interconnecting it with concepts and objects in primary texts; Ensure maximal possible completeness of the collection

  3. To retrace a multiplicity and variety of relations between actors to follow their evolution and measure their intensity and impact of their interplay;

  4. To enrich primary texts with a network of annotations (both related to the whole text and ad locum), building upon existing research texts and using both automated tools of semantic analysis and manual curation.

  5. To adapt the nanopublications approach (that has been already successfully applied in the natural sciences), to the new domain.

Preparation of the corpus

The perspective of our team on the theoretical, philological atmosphere of the 1920s is that of the fiction, or rather of a particular genre, roman à clef, that underwent an apparent revival during this period. Both works that we adopt as starting points are also philological novels. Each of them is a specific literary speech-act and at the same time a precious testimony of cultural and theoretical events, and a living proof of an intense exchange between literates and scholars within a common philological context. They quote (but also elucidate, advertise, mock or parody) numerous theses, ideas and theories of contemporary philosophers, literary theoreticians, critics, linguists, etc. In fact, fiction and nonfiction, practice and theory shed a reciprocal light on each other. To make it evident, a large amount of secondary literature should be brought to the primary (fiction or nonfiction) text.

has already demonstrated that these two novels give quite a seminal access to the philological context of the period and encompass a big variety of relations connecting their personages (and persons who inspired them). Personal, professional, cultural, generational, conceptual, and other relations are densely interwoven in Konstantin Vaginov’s roman à clef The Goat Song (1928) and his three further texts, Works and Days of Svistonov, Bambocciada & Harpagoniada (1929-34), the whole is often considered as one metatext. Various problems – like unity of science (and of the world), literary vs. everyday language, prose vs. poetry, conscious vs. unconscious, inner form of the word, nature of the sign, methodology of humanitarian knowledge – are discussed within a very interdisciplinary (and very imaginary) community, which will later be known as the Bakhtin Circle. We consider Vaginov’s text with and through the multiplicity of references to the real and dubious inspirations of the novel (L. Pumpianskij, M. Bakhtin, P. Medvedev, P. Luknickij, M. Yudina, M. Kagan) and to various philological and cultural ideas, such as the famous Bakhtin’s concepts of carnival, menipea and dialogue, life vs. art, the unconscious, social aspects of language, and other echos of the contemporary language policy, etc.

The connections of the Vaginov’s novel are presented on its page in VicoGlossia:

Vaginov’s novel in VIcoGlossia

Vaginov’s novel in VIcoGlossia

Connections of the Vaginov’s novel in VicoGlossia

Connections of the Vaginov’s novel in VicoGlossia

Veniamin Kaverin’s novel The Troublemaker (Scandalist), or Evenings on Vasilievsky Island (1928), also both a philological novel and a roman à clef, can be understood only in the context of professional communication and intertextual battle of the author with the key figure of the intellectual scene of the period, V. Shklovsky, the inspiration of the principal character, and consequently with the whole Formalist movement. This important literary testimony of intense linguistic and literary debates sets up numerous cultural and linguistic issues: relations between university and non-academic world; conflicts between old and new theories of language, between philologi vs lingvisty (resp. Anciens vs Modernes), thus imbrications of generational and philological debates; the social nature of language; questions of language policy and planning (with evident allusions to O. Vinokur, L. Yakubinskij); the heritage of A. Potebnja, etc. The novel is also an homage paid to E. Polivanov whom Kaverin admired all his life. Besides, N. Marr’s new doctrine of the language is present in the novel, amalgamated with Polivanov’s views. In a tacit way, the outset of de Saussure’s glory is also to be retraced (The Course in general linguistics will only be translated (by A. Sukhotin), commented and edited in 1933 by R. Shor, another shadow figure of the novel, but de Saussure’s influence and controversies around started earlier).

See below the collection related to The Troublemaker.

These two novels are intertextually related to a huge number of other texts, ideas, and debates (mostly the Formalists and Marr’s school, for V. Kaverin; the Bakhtin circle, the Akmeists, for K. Vaginov). Furthermore, these novels can be regarded as performative responses to certain discourses of the mid-1920s (e.g. The Troublemaker is Kaverin’s credo in the debates on “sujet prose”). Finally, these novels have engendered certain texts – critics, polemics, authors’ answers to them – also to be discovered and to be included in the corpus. Some of these connections (mostly linguistic ones) have been analysed in and other studies, many other relations remain to be established and explored. We adopt here the iterative approach – we study and systematize research literature, extracting new connections, and this way completing the corpus.

Although this work is not finished yet, the initial collection can be found at the screen shot below.

We assembled а collection of texts that form a community where the interconnections between texts within this collection are tighter than with the other texts. Already this minimal collection allows to demonstrate the viability of our approach. In the course of the constitution of the corpus, all fiction and nonfiction works pertinent for understanding or explicitation of novels – and vice versa – have been stored in the VicoGlossia database according to the TEI guidelines. Where necessary, the original Russian texts, as well as their translations into English, French, and German are digitized. We edit and curate the texts according to the appropriate philological standards (with canonical pagination, meta-data, variance, etc). The VicoGlossia platform allows to align translations automatically at the sentence level and upload facsimiles (when available).

Goat Song in VicoGlossia

Goat Song in VicoGlossia

Structured repository of secondary literature

We started the collection of secondary literature from the bibliography of and expanded it in conjunction with primary texts, in order to present the history of debates and cartography of interpretations concerning Russian philology of the 1920s. The full-text collection is being formed by downloading available texts, then by scanning and digitizing further publications. Using automated tools of text analysis, available in the VicoGlossia platform, we identify main named entities (persons, places), periods of time, events, etc., and we build a consistent graph representing relations between these entities and multiple semantic links between primary and secondary texts. We curate and extend this data manually, producing a high-quality structured bibliography of the field. As a result, research literature is connected in multiple ways with primary texts through text elements and comments to them, named entities, and document metadata: citations are identified and linked to corresponding texts; personalia and events are linked throughout the texts with biographies and/or historical references. Where appropriate, connections are established across available translations and editions. All annotations are available for future additions and amendments. Modern initially digitized secondary literature is Zotero-compatible, and linked to Academia, Google Scholar, and other resources. It is structured in the Resource Description and Access (RDA) standard, allowing research from any metadata or any combination of them.

Retracing connections, assessment of the interference of relations

Scholar and literary works of the period were interwoven in various relations of intertextuality , or transtextuality, and both novels offer a rather complete array of them. Some authors, in their texts, were not that remote, as one might pretend (Kaverin / Vaginov). Some texts, considered as pioneering, have important predecessors (allegedly revolutionary in linguistics, Marr was deeply rooted in the context of the epoch: ). Relations could be not only connective, but also disruptive: some disciples’ and followers’ texts were loyal to their mentors only on the surface (Marr / Frejdenberg: , ; Marr / Shor: ); some insights could be produced through productive misunderstanding (erroneous ontological character of the phoneme in Eurasians, that led to the phonology, ); some ideas were more successful when transplanted onto other field (theory of hybridisation transferred from linguistics to analysis of ethnographic facts and other types of transfers).

But relations to take into account were by far not only textual. We have already established following types of relations (connections followed by transfers):

  1. conceptual: ideas or concepts discussed during the period (language vs. literature; prose vs. poetry; holistic paradigm of knowledge; new man resp. new reader; role of subject/sujet; art as ensemble of devices/techniques; (Marr’s) popular linguistics);

  2. institutional-collaborative: a variety of communities and institutions with various degree of freedom and coerciveness for their members: Bakhtin’s Circle, Serapion Brothers, M. Yudina’s salon, OPOYaZ, Moscow Linguistic Circle, LEF…);

  3. personal: various personal (or even family: Pasternak / Frejdenberg, Kaverin / Tynjanov) relations might be important for the circulation of ideas;

  4. disciplinary: scientific schools and teacher-student continuity (Marr & Piotrovskij, Meshchaninov, Frejdenberg; Polivanov & Kaverin) in spite of intense negotiation phase about the borders between scientific disciplines;

  5. generational: many disciples combined debt and criticism towards their mentors (Baudouin de Courtenay, Shklovsky or Marr);

  6. professional-communicative: various forms of speech acts articulating the professional communication: manifests, book reviews, letters, dialogues/quarrels by means of literary texts, play of reciprocal dedications, romans à clef, etc.;

Some relations were marked by clearly disruptive or transgressive features:

  1. competitive-agonal: some actors were connected through their opposition (holistic paradigm vs. mechanistic paradigm; Marr / Polivanov).

  2. inter- and transdisciplinary: transgressions of discipline borders (importance of milieu: Bakhtin / Lysenko / Mandelstam, quest of the language of cinema: Eizenshtejn / Marr / Vygotskij / Lurija, holistic paradigm: Marr / Vernadskij / Berg)

  3. inter-genre: intense quest of genre renewal, letters/treatises/novels in Formalists, theoretical diary in L. Ginzburg etc.

  4. inter-communitarian: relations between communities and institutions (Moscow Linguistic Circle & GAKhN), continuity between them (MLC & Prague linguistic circle); simultaneous membership or conversion from one to another (Shklovskij, Vaginov).

  5. international and -lingual: Husserl vs. Shpet and other phenomenologists; Saussure vs. Shor and other linguists; Humboldt / Potebnia / Shklovskij; Meillet / Marr; Cassirer / Frank-Kameneckij / Frejdenberg; in general neokantianism and its Russian reception.

  6. intermedial: fruitful passages literature / cinema in Shklovskij, Tynjanov et al.; literature / theatre in Bulgakov; arts / literature in O. Forsh.

  7. real-fictional: Shklovskij, Polivanov, Pumpianskij et al. as inspirations of fictional representations.

  8. artistic-scientific: exchange between literature and literary science (Khlebnikov / Jakobson, Shklovskij as both artist and scholar), but also in other domains (Vygotskij / Mandelstam).

  9. genetic-receptive: sometimes reception of a work is radically different from the context of its creation (Bakhtin’s rediscovery and his productive mis-evaluations in Europe).

All these relations evolved in time and change their importance. Besides, each of them did not exist separately; the interplay between them was not an exception but a rule, and they did not only coexist, but impacted each other, modified, augmented, or diminished the common effect. To follow their evolution, to examine their interference, and to assess their intensity, their mutual and joint impact we use digital tools offered the VicoGlossia platform. In a methodological sense, the work must result in the discovery of numerous gaps which were left without due attention during the work conducted with more traditional methods . E.g., speaking in his The Troublemaker about A. Meillet, V. Kaverin evidently meant another linguist, F. de Saussure: it was he (and not Meillet) who failed to write a book on general linguistics, that is, the famous Course in general linguistics. This (conscious?) substitution (Saussure instead of Meillet) allows to analyze a whole cluster of topics and problems connected with the reception of the Course in Russia. Even if the early stages of its reception seem to have been already well studied (by M. Chudakova, E. Toddes, C. Genty-Depretto, I. Ageeva-Tylkowski, I. Ivanova), no detailed research has been done yet about very rich comments provided by R. Shor to her first Russian edition of the Course. We are translating these comments in English and French, commenting and analyzing them through the prism of the whole context of the studied epoch from a historico-epistemological point of view, starting with the names and research problems explicitly mentioned in these comments. Additionally, this research constitutes another step forward in the study of the intellectual heritage of R. Shor, the study which has only recently begun in a detailed way and future potential directions of which have already been listed. Further gaps and lacunas to fill will become manifest in the course of our research.

Relations in VicoGlossia

Relations in VicoGlossia

Annotation for semantic and contextual network

We supply primary texts with a variety of general and ad locum annotations referring to secondary literature or to respective passages in it. The annotations can concern lexical, phrasal, sub- and supra-phrasal text units, and are of different type: biographic, historical, theoretical, treating influence and impact, etc. This information is structured according to the following scheme.

Metadata & Explanatory notes for each work


  • Author

  • Title

  • Genre / Domain

  • Further features/tags

  • Language

  • Dates of creation

  • Date of publication

  • Volume (number of characters)

  • Editions: year, place, number of copies

  • Bibliometric indicators

Explanatory notes

  1. Textuality

    1. Material evidences (eventual available documentary forms (facsimilé of manuscripts, of the 1st edition)

    2. Principal textual variants

    3. Critical editions of the text

    4. Existing digitized versions

    5. Reason of the choice of the given variant

    6. Design (format, handwriting, typography, etc.)

  2. Contextuality

    1. Chronology of the creation

    2. Textual genetics (composition, revision, production, censorship, etc.)

    3. Evidences (personal diary, letters, memoirs of contemporaries)

    4. Social, political and intellectual milieu.

  3. Intratextuality

    1. Linguistic analysis (lexique, style, syntax, etc.)

    2. Literary (genre, narrative or poetological) analysis

    3. Fictional chronology

    4. Structure/composition

    5. Dramatic framework

    6. Principal personages

    7. Other personages

    8. Central problematic

    9. Motives, topics, themata

    10. Ideological analysis

    11. Cultural analysis

    12. Tradition vs. novelty of the work

    13. Peculiarities of the work

    14. Literary (philosophical, political, scientific, etc.) appraisal of the work

  4. Intertextuality

    1. a. Relations with the author’s other works

    2. Influence or inspiration (positive or negative) by other authors’ works (past)

    3. Interconnections with contemporaries (present)

    4. Interdisciplinarity – intermediality

    5. Impact on other authors’ works (future)

    6. Translations

    7. Adaptations (literary abridgements, adaptations in theatre, cinema, radio, music, dance, comics, etc.)

  5. Interpretation

    1. Self-evaluation

    2. First reactions (reception by contemporaries)

    3. History of interpretation

    4. Current trends in interpretation of the work

    5. Current controversies

Adapting the nanopublications approach

As outlined above, an understanding of the intricate relations and interactions between persons, texts, places, events, and of the interferences of such relations is crucial for understanding the phenomenon of Russian philological culture in the 1920s. To make full use of the extracted information concerning it, more than just free-form text is needed; in particular, it is important to document its provenance. Scholars have an opportunity to contest a result, to propose their hypothesis without erasing the contested one; the solution may come later, when new information becomes available. It is possible to integrate the results of automatic processing, e.g., named entity extraction, as preliminary annotation that can be validated later by researchers.

Our approach to this issue is to uniformly represent annotations as self-contained entities on the Semantic Web, as inspired by nanopublications, which are already successfully being used in bioinformatics. Within our platform, we adapt and extend nanopublications for applications in the humanities, in particular for the representation of intertextual relationships. Our work in this field thus concerns three main areas: (I) the extension of the nanopublications concept to the needs of humanities research. We build on the work by and , but handling uncertainty and vagueness which play a much bigger role in the humanities than in natural sciences and is therefore be an important research topic; in this particular area, ORCA (Ontology of Reasoning, Certainty and Attribution) could be a good starting point, but needs to be extended; (ii) the development of ontologies and controlled vocabularies required for formally representing intertextual references and scholars’ findings about them; this research builds on existing ontologies from other fields, such as the FaBiO, CiTO , and SAWS ontologies mentioned above; (iii) the integration of tools necessary for collaboratively creating, browsing, searching, and generally managing annotations represented as nanopublications into the VicoGlossia platform. Since nanopublications are based on the same Semantic Web technologies as VicoGlossia (in particular, RDF), it can seamlessly integrate them into the system.

Implementation of the toolkit

The current version of the VicoGlossia platform presents a web application that consists of the library of primary texts, repository of secondary literature and tools that allow users of the platform to comfortably navigate, read and annotate the texts. The main approach to visualization of the intertextual connections chosen in VicoGlossia is the integration of the semantic links directly into the reading flow: additional information and links appear directly in text, allowing for natural discovery as well as directed search based on the contents. We believe this implementation is the most suited both for the public willing to put the books into broader context and for the scholars that need to examine large corpuses of literature and generally have a specific interest in doing so.

VicoGlossia Web Interface

VicoGlossia Web Interface

Nanopublications are implemented as annotations attached to the entities (characters, locations, specific recurring words) or arbitrary chunks of the text (events, phrases). Each annotation is a virtual post-it note containing necessary information — explanation or description, categorization, information about author and links through which the text is embedded into the intertextual network. A number of natural language processing algorithms are used to create guesses for entities (using named entity recognition, keyphrase extraction) as well as automatically align the translations and editions of texts for parallel examination (using statistical dictionary- and machine translation-based algorithms).

Users contribute by uploading the primary and secondary texts, supplying metadata and, most importantly, by creating nanopublications inside the system. VicoGlossia is a crowdsourcing platform where scholars are motivated to contribute by uploading and linking their own research. Their well-interlinked studies become immediately visible in connection to the primary texts and other publications.

Relevance and impact

Scientific relevance

With our structured text collection, scholars and students obtain a valuable sustainable research tool devoted to the Russian philology of the 1920s involved in the transdisciplinary circulation of ideas and approached through the fictional optics of its contemporaries. The collection offers the state of historical textual and interpretative research on the text corpus, mapping international and interlingual debates about it, it combines quantitative and qualitative tools that allows not only to retrace relations, but also to study and measure their dynamic, their mutual juxtapositions and interferences. Such a collection adapts the possibilities of distant reading to the need of scholars in intellectual history. It gives the user access to secondary literature in a tight connection with the analyzed primary texts. The collection establishes the genealogy of various interpretations of texts by reconstructing the history of their exegesis. The platform can be used easily in the learning process. The collaborative work on it initiates students not only to important texts, but also to various practices like digital editing and quantitative analysis. In a nutshell, our work is a response to the invocation: the edition should be an environment for study and research and a receptacle for new knowledge. We need to find ways to create live connections from our scholarship into the data that support it, and our present publishing system is not equipped to do that ( , p. 15; cf. in line with P. Boot).

Broader impact

Simultaneously, we consider this structured text collection a proof of concept of the sustainable new type of a digital library extendable to further domains and periods, and aim at both successful collaboration between specialists, and intense communication between specialists and interested readers. On an intermediary phase, the use of the platform for university learning can be assessed. The nature of our research presupposes that, besides the concrete results in the study of the early Soviet linguistics and humanities, we are developing best practices for other researches beyond the language and the period concerned. The platform VicoGlossia we are about to construct can become a missing link, a bridge between university/academia and society. In fact, it renders scientific results in the humanities accessible for the public, and involves the reader in assessment and production of humanistic knowledge and interpretation. One of the problems of contemporary humanities studies is that the academic knowledge remains hidden from laymen. The democratization of knowledge is a real challenge, and new technologies can contribute to this process much more largely than they do it now. The structure of paper storage stressed the insular character of the texts, hiding the multiple ties that united them and connected them to each other and to scholarly literature. Web technologies have already elaborated solutions to overcome these limits. A lot of digital projects are aiming at the preservation of the textual heritage of humanity (e.g., through multiple storage). But texts that nobody reads or understands form just a dead heritage. An important responsibility of academic humanists towards laymen consists in facilitating to the latter the access and understanding of the classical literary and non-fictional heritage, through communication around it. The value of such a platform consisting in allowing participation in the accumulated academic knowledge for people not belonging to the educated urbans, living far from universities or good libraries, for migrants and/or indigent persons, should not be overlooked. Today, we are witnessing huge progress in citizen science, that is in involving simple users in knowledge production. There is no reason that humanities remain condemned to the archaic elitism. With the help of platforms like VicoGlossia social and human sciences are turning towards the Society in a new and mutually beneficial way.


