Can Machines Read (Literature)?



In this essay, we reflect on distant reading as one of the various takes on reading that currently prevail in literary scholarship as well as the teaching of literature. We focus on three concepts of reading which for various reasons can be considered inter-related: close reading, surface reading and distant reading. We offer a theoretical treatment of distant reading and demonstrate why it is closely related to the concept of machine reading (part of artificial intelligence). Throughout, we focus on the role of the individual reader in all this and argue that Digital Literary Studies have much to gain from paying closer attention to the so-called natural reading process of individual humans.

In questo articolo vengono proposte delle riflessioni sul distant reading in quanto metodo al momento tra i più favoriti sia nella ricerca sia nell'insegnamento della letteratura, riflessioni che si concentrano sul rapporto che il distant reading intrattiene con altri due concetti collegati, insieme alle relative interrelazioni, ossia il close e il surface reading. Il trattamento teorico del distant reading proposto in questa sede mostra i motivi per cui è strettamente legato al machine reading (parte dell'intelligenza artificiale), prestando una particolare attenzione al ruolo del lettore e sostenendo infine come il campo di studi noto come Digital Literary Studies possa trarre vantaggio dando una maggiore attenzione ai processi "naturali" di lettura effettuati dai singoli individui.


The Oxford Handbook of Reading (part of the Oxford Library of Psychology) has no trouble whatsoever when it comes to a definition of reading. For the psychologists who contribute to this volume, reading is all about negotiating 'your' way through text ( , 7). They empirically analyze eye movement, printed word identification, the role of sound or phonology in silent reading, and the processing of syntax. In the chapter Models of Discourse Comprehension, Edward J. O’Brien and Anne E. Cook distinguish between two levels of representation of a text, the text-base, which is typically assumed to be in the form of connected propositions, and the situation model, which contains information explicitly stated in the text as well as information from the reader’s general world knowledge that helps fill in the fully intended meaning of the text ( , 217-218). The development by the reader of a coherent situation model, which is the minimum requirement for successful comprehension of text, can be studied through the workings of memory, and that’s that. There is indeed no need to think any further if your research goal is to advance reading instruction, to which the Handbook devotes two of its five parts.

The psychologists in the Oxford Handbook of Reading do not zoom in on the reading of literature, which has engendered conceptualizations all its own. As a result of the digital revolution, concepts of reading literature have in recent decades clearly gone beyond the activities of human readers as they are empirically described in the Handbook. The presence of Digital Humanities (DH) in literary studies has indeed been growing quickly. Whereas Stephen Ramsay in 2003 still lamented the inability of computing humanists to break into the mainstream of literary critical scholarship ( , 167), Adam Kirsch in 2014 published a widely shared, yet controversial blog post warning that the growth industry of DH was already taking over, if not altogether sweeping clean, entire English departments. While Kirsch’s fairly dramatic views have been heavily criticized in the online blogosphere, it is undeniable that Digital Humanities have recently gained much momentum across the Geisteswissenschaften, and literary studies more specifically as well.

‘Distant Reading’ has become a very popular term in the field of Digital Literary Studies – at times, the term is even equated with the field itself. Unfortunately, this key term has only rarely received in-depth theoretical treatment in DH, a domain which remains heavily practice-oriented and in which some papers even deliberately postpone theorization and self-reflection – a view countered by Bauer . As a result, an increasingly wide gap has developed between the practice of Digital Literary Studies today and more conventional literary criticism. In this essay, we reflect on distant reading as one of the various takes on reading that currently prevail in literary scholarship as well as the teaching of literature. Below, we discuss three, related theoretical concepts of reading: close reading, surface reading and distant reading. The latter, we argue, can increasingly be viewed as a form of artificial reading (i.e. reading performed by an artificially intelligent agent), which naturally calls into question the analogies and divergences with respect to the reading process of individual, human readers – the reading process which one might call natural.

Through focusing on the role of the reader, we point out the dangers of oversimplification in which scholars caricaturize rather than characterize reading modes, for example, through naïve oppositional pairs, such as close versus distant reading, which upon further inspection are less obvious than one might expect. We conclude this essay by zooming in on the relative absence in present-day Digital Literary Studies of theoretical considerations involving the reader, which stands in stark contrast with the popularity of various cognitive approaches in the literary criticism of the last fifty years, from the reader response criticism of the 60s and 70s to the contemporary study of reading fiction in neurobiology .

Close reading

The notion of close reading is still central to the teaching of literature all over the globe. Close reading was developed and popularized in the 1940s and 50s by the American New Critics, who pointed to the work of I.A. Richards and William Empson as sources of inspiration. Essentially, close reading is heavily text-oriented: it is all about paying attention to the formal details of a literary text. Without such a respectful and penetrating attitude, correct interpretations are impossible, or so the practitioners of the method would like to hold. While the method has led to some general suggestions (e.g. about the fundamental ambiguity of a literary text, which often derives from the use of metaphor), close reading can in fact only be taught by example because of its relative lack of discovery procedures that might help transfer reading strategies from one text to another. This hasn’t prevented the notion from becoming dominant as a cover term for the patient exploration of a literary text, which seems to remain the most important teachable skill in courses on literature. As Rita Felski puts it in her influential book Uses of Literature, [t]he practice of close reading is tacitly viewed by many literary scholars as the mark of their tribe—as what sets them apart, in the last instance, from their like-minded colleagues in sociology or history ( , 52).

Especially in the United States, close reading is indeed still popular when there is a need for a positive description of interpretive reading. The notion indicates a fundamental interest in the literariness of a text, and as such it still provides a framework in spite of the more ideological or ethically oriented treatment of literature that is currently required in most American literature classrooms, certainly on the undergraduate level. The popularity of close reading extends beyond this relatively carefree environment. A good example is the field of medicine. After the narrative turn , structuralist narratology sought to branch out by transcending its focus on literary narrative. Conversely, some disciplines have turned to narratology for a set of tools to analyze narratives as they function in their field. In order to describe its method for engaging with the stories of patients, narrative medicine has looked even beyond this toolkit and realized that what they are really interested in is a form of close reading. In The Principles and Practices of Narrative Medecine, Rita Charon describes close reading as the signature method of her discipline ( , 157). Its core is attentive and accurate listening in a clinical practice (ibid.). Although the effects of the methodology apparently remain hard to explain (Many mysterious processes occur through close reading ( , 170)), and the set of elements to devote attention to in a narrative (time, space, voice and metaphor) remains quite conventional, the book by Charon and her colleagues harbors a deep belief that close reading perfectly underwrites the principles of narrative medicine: (1) action toward social justice; (2) disciplinary rigor; (3) inclusivity; (4) tolerance of ambiguity; (5) participatory and nonhierarchical methods (172). As such, the book provides a perfect illustration of the ethical drive that comes along with the current uses of close reading, first and foremost in the literature classroom. What the teacher eventually teaches through careful analysis is perhaps not so much an eye for literary detail, but rather the capacity to find value while respecting both the text and the other members of the community in which that text is being read.

Surface reading

‘Surface reading’ has become the dominant container term for methods of reading literature that no longer look for the hidden meaning of a text. Close reading, as we have just seen, may have turned into an ethical undertaking, but it is definitely still geared to finding a meaning (typically through a focus on form) that is not immediately apparent. Similarly, methods deriving from Marx and post-Marxism are primarily interested in revealing the ideological aspects of any literary text. While close reading springs from an interest in literariness as defined through form, and (post-)Marxist readings derive from a specific world view, they both constitute instances of symptomatic reading ( , 3-9) because they are on the look-out for symptoms, clues or keys to what is ‘really’ in the text – a particular type of metaphor or a coded sign of the master/slave relationship. (Post-)Marxist, (post-)feminist or queer critics may have (legitimately) criticized the elitism of close reading in its original guise, but all these ways of reading are digging for meaning below the surface of the text. Coined by Sharon Marcus in her book Between Women: Friendship, Desire, and Marriage in Victorian England , and further developed by Marcus and Stephen M. Best in their introduction to a special issue of the journal Representations (2009), surface reading is less ambitious and concentrates on what is evident, perceptible, apprehensible in texts . As Jeffrey J. Williams puts it, the critic is no longer like a detective who doesn’t trust the suspect but more the social scientist who describes the manifest statements of a text ( , 7).

Best and Marcus present the Marxist critic Fredric Jameson as the champion of symptomatic reading in that he attacks weak, descriptive, empirical, ideologically complicit readers for not rewrit[ing] narrative in terms of master codes, disclosing its status as ideology, as an imaginary resolution of real contradictions (Jameson 1981, 13; as presented in , 5). They object that the surface of a text can be studied in its own right, without necessarily leading to complicity – perhaps the worst mistake from the point of view of a Marxist intellectual: “A surface is what insists on being looked at rather than what we must train ourselves to see through” ( , 9). ‘Surface’ has many meanings. It can refer to materiality of the book or the reading process and as such it can be likened to the New or Material Philology movement, with its focus on tangible traces of reader response . It can also relate to the intricate verbal structure of literary language , which implies a modest form of New Criticism but strikingly forgets that the New Critics haven’t always been so modest in their search for meaning. In a slightly more mystifying way, ‘surface’ can also lead to a refusal of meaning in favor of Susan Sontag’s erotics of reading (1966) which can take the form of attending to the text, or to one’s affective responses to it ( , 10).

Best and Marcus add three more types of doing surface reading ( , 11). Scholars can pay attention to surface as a practice of critical description instead of applying a theory to it. Depth, in other words, is continuous with surface and is thus an effect of immanence. Scholars can also turn to ‘surface’ as the location of patterns that exists within and across texts. The prime example of this practice is supposed to be narratology, which allegedly looks for patterns without a desire for interpretation. As we have already indicated in our previous section, this basic ambition of narratology has led to a toolkit that undergirds interpretations in literary classrooms all over the world. With narratology, therefore, it seems difficult to say where surface stops and depth begins. Finally, Best and Marcus point to ‘surface’ as literal meaning ( , 12). Presence doesn’t have to mean absence (as it does in ‘symptomatic reading’), nor does affirmation have to mean negation. In Between Women: Friendship, Desire, and Marriage in Victorian England, for instance, Marcus shows that in Victorian fiction, female friendship is not cancelled by courtship plots (as ‘symptomatic reading’ might like to suggest) but remains central to what the text conveys.

All of these types of ‘surface reading’ clearly share an aversion to ‘symptomatic’ reading. The difference between the two may sometimes not be so clear, but more importantly, ‘surface reading’ paradoxically takes the moral high road for which it reproaches its ‘symptomatic’ opponent. When Best and Marcus realize that surface reading, which strives to describe texts accurately, might easily be dismissed as politically quietist, too willing to accept things as they are ( , 16), they fall back on an ethical stance similar to the one we have already encountered in the contemporary versions of close reading: We want to reclaim from this tradition the accent on immersion in texts […], for we understand that attentiveness to the artwork as itself a kind of freedom ( , 16). Through paying close attention to the surface, readers can transcend the imperatives and limitations of their situation and reach a state of mind that is equally valuable, if less glamorous than that which results from the work of demystification in ‘symptomatic reading’ ( , 17). In other words, ‘surface reading’ is close reading without the burning desire to interpret.

Distant reading

Distant reading is a term which has been proposed in an influential series of essays by Franco Moretti, first published in the New Left Review (2000), and which have been reprinted (and concisely commented on) in the collection Distant Reading . The introduction of the term must be understood against the backdrop of Moretti’s interests as a (Marxist) scholar of comparative literature, and more specifically the ambitious notion of World Literature, which been gaining much traction in recent years (e.g. ). In Conjectures on World Literature, Moretti offers a typically comparatist plea for a more inclusive study of literature, namely one which would go beyond the obligatory canon of well-known (English-language) authors and which would be extended to what Margaret Cohen has called the great unread ( , 23). However, to reach or even approximate this objective, Moretti claims that simply reading more is unfeasible. He proposes instead a ‘second-hand’ approach to reading, in which scholars must dare to rely more extensively on ‘a patchwork of other people’s research, without a single direct textual reading’ ( , 57; Moretti’s italics). Only through such a cascaded – and perhaps in some ways ‘parasitic’? – reading practice world literature can be studied on the scale it deserves, Moretti claims.

However, such an increase in scope has implications, Moretti stresses: the more we read, the more shallow our reading must become: ‘the ambition is now directly proportional to the distance from the text: the more ambitious the project, the greater must the distance be’ (ibid.). As such, distance becomes a function of scope, and negatively correlates with it. Moretti goes on to oppose this project to the practice of close reading: the trouble with close reading (in all of its incarnations, from the new criticism to deconstruction) is that it necessarily depends on an extremely small canon ( , 57). Moretti sees close reading as a theological exercise—very solemn treatment of very few texts taken very seriously (ibid.) and, therefore, does not see it fit for a more inclusive, let alone exhaustive, study of world literature. Following the trend described in our section on close reading, Moretti does not restrict his reproach to the sort of ‘close reading’ as historically practiced by the New Critics; rather, he seems to equate the term with all forms of careful and sustained reading at large (cf. ‘a direct textual reading’). To some extent, and probably for rhetorical effects, one should acknowledge that Moretti here sketches a slightly unrealistic caricature of traditional literary criticism.

Thus, in the seminal essay “Conjectures on World Literature,’ distant reading is perhaps primarily defined negatively, implying the absence of a ‘a single direct textual reading’ in the analysis of literary works. This point of view is mirrored in the term ‘not-reading’, which has been used by a theorist such as Matthew Kirschenbaum . He borrowed the term from Martin Mueller, who stressed that the endeavour of ‘not-reading’, viz. ‘distant reading’, is in itself hardly novel: there are age-old techniques for doing this, some more respectable than others, and they include skimming or eyeballing the text, reading a bibliography or following what somebody else says or writes about it. Knowing how to “not-read is just as important as knowing how to read” (Mueller, qtd. in ).

The fact that Moretti did not strictly define his concept of Distant Reading, apart from the absence of direct readings, has allowed subsequent scholars to come up with their own interpretations. It is a well-known fact that the concept of Distant Reading has quickly surged to popularity in DH. Existing approaches in DH such as computational stylistics, have been quick to appropriate the term to refer to their own work, and so it seems on its way to gradually replace alternatives as algorithmic criticism . It would not be an exaggeration to say that many studies in DH have used the term in a fairly sloganesque, if not shallow fashion, with little theoretical grounding or reflection on the implications of the term. That is not necessarily a bad thing, because many digital humanists have anecdotally described that they have experienced the practice-oriented environment of DH as ‘liberating’, freeing them from the burden or even intimidation of theory ( , 15). In DH, it seems like the term Distant Reading has rapidly become an umbrella term for all forms of computational text analysis, including those for corpora such as newspaper archives, which are not normally considered to be particularly literary in nature. Through this broadening of source materials, DH clearly echoes the New Historicism (e.g. ).

Nevertheless, one should stress that the practice of distant reading does not presuppose the use of computers at all – in line with theoreticians as McCarty who pointed to other characteristic attributes of DH. This is in fact true for many applications in DH: one could in principle carry them out by hand, although that would in many cases be tedious. In fact, it is surprising to observe that computers or digital methods are not even once explicitly mentioned yet in Moretti’s earliest essays. For Moretti, distant reading initially meant reading on the basis of other readings, and not necessarily, let alone exclusively, reading via computer applications. For him, the scope of the reading endeavor mattered more than the method – which also helps explain why he later wrote that he briefly considered to use the term serial reading ( , 44). It should therefore be stressed that the strictly computational interpretation of distant reading has only been created post hoc.

Artificial Reading

In the notion of distant reading, Moretti essentially theorized a two-tier reading process, in which we would base our secondary, parasitic reading of texts on a primary, actual reading of those texts, which we do not perform ourselves. The primary reading stage has so far been interpreted in two ways: the ‘secondary’ reading is based on (a) the reported insights of other people, or (b) the results derived from a computational text analysis. In DH, a quick perusal of the abstracts of the annual global DH conference already shows that (b) outranks (a) in popularity. In both cases, however, distant reading is a two-tier process, where the secondary reading depends on a primary model (simulation, approximation etc.; cf. ) of a first-hand reading process, be it computational or not.

The two interpretations of the primary reading stage primarily differ as to whether there is still a human agent included in the first tier. Another terminology to capture this distinction is to differentiate between the sort of agent to which the first-hand reading is ‘outsourced’ in distant reading: we can distinguish between (a) a natural, human reader, and (b) an artificial, computer reader. (Here we speak of natural reading, in the same way that this adjective is used in the phrase ‘natural language processing’, to avoid ambiguity with, among others, computer languages. Likewise, we use the term ‘artificial’ as it is used in the familiar collocation ‘Artificial Intelligence’.) Definition (b) is of course intriguing from a cognitive perspective, because here a computational agent is used to replace the human agents involved in the primary reading (a). The assumption is thus implicit that some form of algorithmic procedure is able to offer a usable model of the human reading experience, i.e. artificial reading as a form of artificial intelligence.

In the first chapter of their classic text book on Artificial Intelligence, Russell and Norvig ( , 4-5) propose a fourfold typology of the different definitions of the goals of AI (here partially reproduced as ). These definitions vary along two axes:




Systems that think like humans.

Systems that think rationally.


Systems that act like humans.

Systems that act rationally.

Definitions of Artificial Intelligence along two axes (reproduced from Russell and Norvig 1995, 4-5).

Where does distant reading fit into this typology? The opposition between thinking and acting in this table seems hard to apply to human reading. Especially since the rise of reader response theory, literary scholars seem to have reached a consensus that reading necessarily involves an active role of the reader in the form of interpretation. Interpretation clearly involves thinking, but whether interpretation also involves a minimal form of deliberate ‘acting’ on the reader’s side is a much fuzzier issue. The opposition between the Human and Rational columns, however, is more interesting. The authors say: “[T]he definitions on the left measure success in terms of human performance, whereas the ones on the right measure against an ideal concept of intelligence, which we will call rationality. A system is rational if it does the right thing” ( , 5).

Let us for a brief moment consider artificial reading as an essential component of general artificial intelligence – theoretically, it makes perfect sense that the ultimate AI implies the machine’s ability to read, interpret and discuss literary texts (cf. ). We can then raise the naive question which modelling task is to be preferred in the light of the previous definitions. Do we want computers to read like humans, and should we model their behaviour after that of existing, individual readers? Or do we evaluate a machine’s ability against a concept of an ideal intelligence, i.e. the rational reading capacities of an omniscient, neutral, and probably a-historical reader, who always reads in the ‘right’ way? The quest to develop such a God Reader seems shockingly naive from the point of view of literary theory: many scholars will nowadays hold that there is no other reading beyond individual reading, and the profound individuality of the reading process renders it questionable whether we would even be able to recognize a perfect rationality in reading if we were to attain it by accident.

This calls into question where we should in fact situate distant reading in literary theory. The use of the word ‘reading’ – even if only metaphorically – in the phrase would suggest that distant reading is more a matter of hermeneutics, than poetics – to use Culler’s terminology , more a matter of meaning, than form. In the everyday practice of Distant Reading, however, we know of very few examples where (individual) hermeneutics play a role of significance, a point also raised by Martindale .


Let us recapitulate. Above we have briefly sketched the rise of Distant Reading as an important and novel conception of (literary) reading in the previous years. Although originally not intended as such, its current usage entails a form of Artificial Reading or Machine Reading, since readers outsource a significant part of the traditional reading process to a machine. Human interpretation will still be a required part of the process in most applications, but it happens at the level of the output of a computational reading process, i.e. the simplified model of a text corpus that it has yielded.

The limitations of current computer technology also limit the complexity of the analyses at this point in time: often techniques essentially boil down to an advanced form of word counting that might beat a human in processing scope but rarely in hermeneutic quality – the notorious Google Books paper is a text book example in this respect. Even as more advanced techniques from distributional semantics – and their admittedly impressive results – are now entering the stage (e.g. ), one should not overestimate the hermeneutic depth that machines can currently reach. In a situation characterized by a fundamental shallowness, distant reading and surface reading appear as children of the same Zeitgeist, although potentially for different reasons: whereas surface readers deliberately choose to stick to a text’s surface, distant readers are currently still practically hindered by the lack of suitable technology to produce deeper readings of texts, even if they wanted to dig beneath a text’s symptomatic surface. At the same time, the value of the textual surface is clearly recognized by digital humanists. Modern stylometry, for instance, often praises counting simple function words as a preferred gateway into a text. John Burrows, one of the pioneers of stylometry, famously said, with a reference to Jane Austen ( , 1): It is a truth not generally acknowledged that, in most discussions of works of English fiction, we proceed as if a third, two-fifths, a half of our material were not really there.

How does the value of ‘surface reading’ relate to the recent turn to […] machine intelligence across a range of fields and practices, from book history to distant reading ( , 17)? Best and Marcus welcome this development, as long as human readers get the upper hand: We are not envisioning a world in which computers replace literary critics but are curious about one in which we work with them to expand what we do ( , 17). In other words, computers can be made to pay attention to the surface, and the humanist endeavor of criticism essentially remains in place. In contrast to the sciences, which focus on processes beyond our creation and control, the study of culture is defined by our interest in human artifacts, and a practice like distant reading nicely fits that undertaking. For many such applications in Digital Literary Studies, as in surface reading, the text’s surface clearly suffices and scholars feel no need to dig deeper.

Nevertheless, if we would focus on more ambitious forms of machine reading, that is, machine reading as artificial intelligence, the question becomes what its ultimate goal should be: is it feasible – or even desirable – to develop a perfectly rational God Reader, or do we ultimately aim to simulate the actual reading process of a (specific, individual) human being? Generally speaking, distant reading is not tied to a specific computational methodology and various text processing techniques have been accepted as valid operationalizations of it, ranging from the simple word counting in culturomics, over dimension reduction techniques in stylometry to the distributional methods developed by computational semantics. If we abstract from their individual differences, these techniques understandably remain heavily text-oriented: in the end, they primarily yield models of texts (and to some extent: their producers) rather than models of readers, in the sense that texts are analyzed without taking into account that texts can invite different responses from different (communities of) readers.

Distant reading as currently practiced, is to a considerable extent an unsituated, uncontextualized form of reading, in that most of its instantiations stay far away from even attempting to mimic actual human reading, which is by definition situated and context-driven. Distant reading is single-model reading – God Reading – whereas a more profound reader-awareness would depend on a plurality of heterogeneous models. This relative lack of reader awareness feels somewhat uncomfortable after the turn of the twentieth century, in which one of the primary insights yielded by literary theory was the general relativity of reading: reading is always done by a specific individual against a specific historic backdrop.

Interestingly, close reading, as practiced by the school of New Critics, the paradigm against which Moretti so heavily revolted, might have suffered from similar weaknesses: it too was heavily text oriented and did not stimulate the production of new readings; it was literally taught by example, in the sense that authoritative readings were meant to be reproduced, instead of produced. It too, was an uncontextualized approach of literature that stressed the long life of literature. In the words of Bertens, who characterizes Matthew Arnold as a forerunner of the New Critics (and an excellent representative of Liberal Humanism at large):

The classics and the ideal of culture that they embody are timeless for Arnold. This is a vitally important point: ‘the best that has been thought and said in the world’, whether to be found in the classics or in later writers, is the best for every age and every place ( , 7).

For many New Critics too, it appears that they strove for God Reading, rather than actual reading. For Eliot, poetry was profoundly impersonal and sterile, stripped of all autobiographism and personal emotion ( , 13-14). For all its revolt against close reading – and notwithstanding a list of other differences – it is clear that distant reading, at least in this aspect, closely mimics the New Critical endeavour. At the same time, this striking lack of reader awareness presents interesting opportunities for future research in Distant Reading.


This paper is an English translation of a Dutch-language article that will appear in a themed issue of the Tijdschrift voor Nederlandse Taal- en Letterkunde [2019]. The authors would like to thank Federico Meschini (Università degli Studi della Tuscia) and Greta Franzini (University College London) for their help in translating the abstract to Italian.


