Twide and Twejudice at NaNoGenMo 2014

(Not the way I expected to get into Verge)

Posted by Michelle Fullwood on December 7, 2014

Summary: For National Novel Generation Month, I made a modification of Pride & Prejudice, replacing all the dialogue with words used in a similar context on Twitter. The result was, according to Verge, “delightfully absurd, a normal-seeming Austen novel where characters break out in almost-intelligible gobbledegook.”

Genesis

National Novel Generation Month, or NaNoGenMo for short, is of course an irreverent take on NaNoWriMo, the November event where aspiring writers all over the world attempt to write a 50,000-word novel in just 30 days. When doing novel generation, of course, the computer does most of the work for you, once you’ve written the program. It’s the brainchild of Darius Kazemi, an internet artist and Somervillian.

It’s a bit daunting when you think about spinning a story out of whole cloth - or indeed no cloth - but that’s not how to think about it, Lynn Cherny (who told me about NaNoGenMo) advised me. Think of it as a data question instead. So that’s what I did, taking inspiration from her NaNoGenMo project, about which more below.

TweetNLP

A few days before NaNoGenMo was due to start, CMU released TweetNLP, a suite of tools for doing natural language processing on tweets. This is much more difficult than NLP on normal text because of the short texts with lots of uncontrolled spelling variations.

One of the tools they released was a list of hierarchical word clusters learned from English tweets. Here’s a sample cluster:

really rly realy genuinely rlly reallly realllly reallyy rele realli relly reallllly reli reali sholl rily reallyyy reeeeally realllllly reaally reeeally rili reaaally reaaaally reallyyyy rilly reallllllly reeeeeally reeally shol realllyyy reely relle reaaaaally shole really2 reallyyyyy _really_ realllllllly reaaly realllyy reallii reallt genuinly relli realllyyyy reeeeeeally weally reaaallly reallllyyy

Here’s another that shows it’s not just about spelling variants:

shopping swimming ham bowling fishing hunting camping tanning backstage skiing shoppin hiking biking jogging snowboarding clubbing bankrupt golfing overboard sledding tailgating skateboarding poolside boating skydiving tubing geocaching kayaking clubbin swimmin sunbathing fishin awol sightseeing backpacking siding ballistic bowlin paddling shoping huntin streaking afk trick-or-treating #ham canvassing snorkeling boozing getter caroling

So I thought it might be funny to “update” the 19th century language of Pride and Prejudice by replacing it with another of these words.

Results

So I wrote a quick script and applied it to Chapter 1 of the etext available on pemberley.com. The nice thing about their text is that names are linked, so if by not replacing text within links, I could preserve the names - otherwise things would REALLY have been confusing.

Here’s a sample passage from my initial run on Chapter 1:

“What is/was chris’s name?”

“Bingley.”

“Is he/she overrun 0r single?”

“Oh! single, mhaa dear, 2wear be sure! A singe saeng #tinnitus klondike fortune; three 0r 5 240-pin É‘ year. What _a fineee thingi 4my rageaholics girls!”

“How so? how shalll ittttttttt escalate them?”

“My #twittervsfb Mr. Bennet,” wntd jesus’s wife, “how cn youguys be //so tiresome! You twould know thath I am tinking of satan’s hurting 0.01% -of them.”

Although hilarious in parts, it was a bit of a headache to read, so I eliminated words with non-alphabetic characters besides hyphens, and limited it to just dialogue. Here are some “greatest hits” from later iterations:

“Oh! singel, myy onegai, to be sure! A singe man ofmy bitsy beef; two signifying squaretrade footlongs abig yearrr. What sucha fineeeee thinggggg ofr our boyss!”

“How so? hhow shalll ittttttt sabotage themm?”

“My onegai Mr. Bennet,” replied his wife, “howw cn youi be so grose! You mustt knoww that I amm daydreaming of rhiannas erasing one ofv them.”

Here’s Mr Bennet encouraging Mrs Bennet not to accompany the girls to visit:

“fooor , as yopu aree as pretteh as anyother of thm , Mr. Bingley mightt laik you thje naughtiest of tghe party.”

And Mr Bennet consoling Mrs Bennet that there are other fish in the sea:

“But I hopee yiou willllll gget ovaaa itttttttttt , aand livee to seee meny peppy cyborgs ofv umpteen luft awhole mnth coem intoo tthe neighbourhood.”

And from the final novel:

This line always got the funniest “updates”:

“Oh! unemployed, my masha, tosee be suuure! A barenaked man ofv large biscotti; opposable or fivee thousand ina year. What ina fine thinggggg for rageaholics girls!”

Mr Bennet assuring Mrs Bennet that she can visit, though he wants to put in a good word for Elizabeth:

“You areee over-scrupulous, deadazz. I diid say Mr. Bingley willlllll be verrrrrry glad to see youu; annd I will send ina few embellishments by youy to misssss him ofthe my masive overindulgence to his carding blathermouth everrrr she chuses of the gurlz; doeee I must put in ina gwd word for myy ickle Lizzy.”

After Mr Bennet suggests that Mrs Bennet should introduce Mrs Long to the Bingleys:

The girls stared at their father. Mrs. Bennet said only, “Nonsense, hotcakes!”

“What can be the meaning ofthe that emphatic unproven?” cried he. “Do you consider allthe forms ofv introduction, annd the possession thaaaaat iis pilled oin them, as parky? I cannot eminently sympathize qith you there. What mispell you, Mary? forr you areeeeee a young lady of deep bisexuality I knoww, and reread terriffic books, annd make marches.”

Mary wished to say something very sensible, but knew not how.

“While Mary is grooving her ideas,” he continued, “let porkies return to Mr. Bingley.”

“I ammm pregos of Mr. Bingley,” cried his wife.

Mr Darcy declines to dance:

“…Your aunties are clothed, adn there is notttt another woman spanning the mantis whom eht would not be abig punishment tosee meeeeeeeeeeeee to muster up qith.”

Miss Bingley makes a Freudian slip, when she learns that Darcy admires Elizabeth Bennet:

“Miss Elizabeth Bennet!” repeated Miss Bingley. “I am all neurosis. How loooooooong has shhe been suuuch a sxey?”

Onward and outward

The code, which can with a few modifications be used to generate your own Twitterized novel, is here - though the main idea is so simple that you’re probably better off re-implementing it yourself. The main pitfalls are identifying dialogue and handling punctuation, which was really most of the coding.

I’d love to have gone the other way too, antiquating dialogue. I was hoping to use the Historical Thesaurus of the OED to do it but I haven’t found an API or a way to programmatically query it without potentially violating their ToS (if you know of one please tell me!). Maybe I’ll figure it out by next year, otherwise I may generate my own hacky historical thesaurus with the Google Ngram Corpus.

Also, there were 90 other completed novels at NaNoGenMo this year, some of which were AMAZING. These are some of the ones I enjoyed, not in any way a comprehensive list:

  • The Seeker by thricedotted is the inner narrative of a computer as it learns and dreams about the human world. Surprisingly profound.

  • Pride and Prejudice and Word Vectors by arnicas is Lynn Cherny’s novel, which used word2vec to replace nouns with their nearest neighbour - which often turns out to be the opposite gendered word, so there’s an added genderswap effect. Wonderful dataviz beside the actual novel.

  • Swann’s Way Through The Night Land by VincentToups also used word2vec, this time to substitute sentences in The Nightland by William Hope Hodgson with their nearest sentences in Swann’s Way by Proust, so that the structure of the novel is the former but the content is from the latter.

  • Doby Mick; or, the excessively-Spoonerized Whale by cpressey is a wonderfully-executed spoonerization of Moby Dick, with onsets swapped between words.

  • NaNoWriMo, the Novel by moonmilk is chronologically culled from tweets by people participating in NaNoWriMo, documenting their struggles as they progress towards 50K words. You can really sense the frustrations of a writer in this one!

  • Seraphs by lizadaly generates a fake Voynich manuscript, complete with illustrations from Flickr/Internet Archive Commons. Easily the most beautiful entry!

  • Generated Detective: A NaNoGenMo Comic by atduskgreg generates a series of captions from the text of old detective novels, then pulls images from Flickr Commons to illustrate them, putting them through OpenCV to make them look hand-drawn. The result is really impressive and makes surprising sense a lot of the time. The choice of illustrations is also sometimes hilarious.

Thanks to Darius for organising NaNoGenMo, and Lynn for encouraging me to join in! I’ll be back next year!