The Shakespeare Algorithm

Here is an interesting article from The New Yorker by Alastair Gee.
The original article is here.

Shakespeare in schools AlgorithmIn 1727, a writer and editor named Lewis Theobald was preparing to unveil “Double Falsehood,” a tragicomedy that he said was based on manuscripts of a lost play by Shakespeare. “The good old Master of the English Drama is by a kind of Miracle recall’d from his Grave, and given to us once again,” the London Journal reported, when news of Theobald’s project emerged. Ever since then, however, the work has presented difficulties to the gatekeepers of the canon. For one, the manuscripts have vanished. For another, Theobald has a checkered reputation; he was accused of plagiarizing his play “The Perfidious Brother,” and his starring role in Alexander Pope’s satirical poem “The Dunciad” doesn’t help matters. Then there is the text itself, which isn’t especially good. Certainly “Double Falsehood” contains echoes of Shakespeare (“A gleam of day breaks sudden from her window”), but for the most part the language sags or is ungainly. Would the Bard have called a woman so fair that her face could make “a frozen hermit leap from his cell” to kiss it? (Well, perhaps not, but he did write that “A withered hermit, five-score winters torn, / Might shake off fifty, looking in her eye.”)

In 2010, Theobald was partially unburdened of his ignominy: “Double Falsehood” was released as part of the respected Arden Shakespeare series. “It’s not ‘King Lear’—we have to agree to that,” Brean Hammond, the edition’s editor, told me. “And we have to agree that it’s devoid of a lot of the kinds of metaphorical density—the thickly woven, metaphorical, imagistic passages—for which we now value Shakespeare.” Still, Hammond and a number of other scholars believe that the play, unlike the dozens of others that have been provisionally attributed to Shakespeare, in whole or in part, over the centuries, has the bones of some earlier work in which the playwright was involved. Robert Folkenflik, an emeritus professor of English at the University of California, Irvine, said that he once heard “Double Falsehood” compared to “an old cobblestoned road that’s been asphalted over, and yet you’ve got these cobblestones sticking out.”

Various pieces of evidence have been cited—and tussled over—as proof of the play’s provenance. An entry in a publishing registry from nearly four decades after Shakespeare’s death, for instance, seems to indicate that he and John Fletcher, his sometime collaborator, wrote a precursor to “Double Falsehood.” (The registry also lists Shakespeare as the author of some decidedly questionable works. “The Merry Devil of Edmonton,” anyone?) Then, in April, new evidence emerged from an unlikely corner: the journal Psychological Science. At Folkenflik’s suggestion, a pair of researchers at the University of Texas at Austin—James Pennebaker and one of his graduate students, Ryan Boyd—had performed a linguistic analysis of “Double Falsehood.” In his previous studies, Pennebaker had found a correlation between how students use articles and prepositions in their college-application essays and what grades they go on to get, and between self-referential writing and suicidality in poets. “I felt as though it was important to look at this as a cold scientist: Here are the numbers, I have no dog in this hunt, it doesn’t matter which way it comes out,” Pennebaker said.

The study focussed in part on function words, the heavy-lifting but unglamorous class that includes pronouns, articles, and prepositions—“I,” “you,” “the,” “a,” “an,” “on,” “in,” “under.” As Pennebaker has written, there are only about four hundred and fifty of them in English, but they account for fifty-five per cent of the words that we use, the linguistic glue that holds everything together but goes mostly unnoticed. “We can’t hear them,” Pennebaker told me recently. “You and I have now been talking for ten minutes, and you have no idea if I’ve used articles at a high rate or a low rate. I have no idea.” Everyone has a pattern, though, and this is what he and Boyd sought in an array of works by Shakespeare, Theobald, and Fletcher. They also took other habits into account, such as three-word phrases typical to each author; for Shakespeare, these included “my lord your,” “what says thou,” and “as it were.” (“Quality work there, Shakey,” Boyd said.)

Generally speaking, the results of the “Double Falsehood” analysis indicate that the voices of Shakespeare and Fletcher predominate, and that Theobald’s is minimally present. It might be objected that, if Theobald had set out to imitate Shakespeare, he would surely have aped his language. But function-word usage is very hard to mimic, Boyd and Pennebaker told me. As with other linguistic tics, a writer’s own propensities will more than likely bleed through. “The Cuckoo’s Calling,” a detective novel, could only have been written by the fantasy author J. K. Rowling; Federalist No. 49, although published under a pseudonym, could only have been written by James Madison. Indeed, as Maria Konnikova reported in March, function-word patterns and other metrics may be able to establish not only an author’s voice but also her disposition and mood. Pennebaker has already produced rough tools that scan people’s tweets for signs of depression and anxiety. The “Double Falsehood” study purported to shed similar light on the Bard’s psychology, noting, for example, that his “relatively dynamic writing style and relatively high use of social content words” suggested someone who was “socially focused and interested in climbing higher on the social ladder.”

To data-mine Shakespeare is, as the Earl of Worcester might have said, “to o’er-walk a current roaring loud / On the unsteadfast footing of a spear.” Boyd and Pennebaker’s study, in other words, has not been universally lauded. Although the experts with whom I spoke were generally excited by the linguistic evidence of the play’s authorship, several were dismissive of the attempts to draw up a psychological profile of the playwright. Ron Rosenbaum, the author of “The Shakespeare Wars,” also questioned the study’s over-all mission. “It’s so savagely reductive to attempt to reduce literature to some algorithm,” he told me. “The way to understand Shakespeare is to continually reread him.” As proof of his argument, Rosenbaum gave the example of “A Funeral Elegy,” a poem that was initially attributed to Shakespeare, with the help of a database called Shaxicon, and then, on the advice of an altogether less algorithmic human reader, reattributed to John Ford, the author of the play “’Tis Pity She’s a Whore.” But Gary Taylor, an editor of the complete Oxford Shakespeare, sees something more than academic principle at play. “Many great writers and literary critics chose to concentrate on English because they hated math,” he told me.