« PREVIOUS ENTRY
Location, location, location
NEXT ENTRY »
Surrealist Hiptop pix!
Do men and women use words in different ways? A group of Israeli artificial-intelligence experts think so. They crunched a bunch of English texts by men and women, both fiction and nonfiction, and looked for interesting patterns. The results? In this paper, they argue that it’s possible to figure out the gender of an author merely by paying attention to a few everyday words — and their guesses are accurate 80 per cent of the time, or higher.
For example, they discovered that in fiction, men are more likely than women to use the words a, the, and as; meanwhile, women are more likely than men to use the words she, for, with, and not. In nonfiction, men are more likely than women to use that and one. Women, however, are more likely than men to use for, with, not, and, and in.
Here’s another weird data point: Men use the pronoun he with roughly the same frequency as women, but women use the total set of all other pronouns — he, she, they, etc. — than men.
Interestingly, there are also some differences between the way everyone uses language in fiction and nonfiction. All authors — both male and female — used pronouns and negation more in fiction than nonfiction.
Did this technique make any mistakes? Yep. The professors crunched 920 English-language texts, and misclassified 12 texts, which were:
Fiction
Possession, by A. S. Byatt
The Remains of the Day, by Kazuo Ishiguro
Now We Are Thirty-Somethings, by Charles Jennings
Now Then Davos, by Martin Wiley, David Harmer, and Ian McMillan
The Seige of Krishnapur, by J. G. Farrell
A Landing on the Sun, by Michael FrayneNonfiction
Thank you for having me, by Maureen Lipman
A Crowd is not Company, by Robert Kee
T.S. Eliot: A Friendship, by Frederick Tomlin
Walking on Water, by Andy Martin
Unpublished Letters and manuscripts, by an Unlisted Female Author
Falling for Love: How Teenaged Mothers Talk, by Sue Sharp
As the scientists note, of the six misclassified non-fiction documents, all are biographical or diary-like. That’s intriguing, insofar as one might expect that people would write most “like” their gender when they’re writing about personal experience. Meanwhile, of the six misclassified fiction documents, all are by men, except for Possession. What’s up with that? Are these men writing “like” women? (Heh — maybe this is a subterranean reason why Jonathan Franzen freaked out so badly when Oprah picked The Corrections for her book club.) On the other hand, decades of gender theory has ably pointed out that gender is an insanely slippery thing: Men can so often act “like” women, and vice versa, that the whole idea of drawing hard lines around what’s male and what’s female is sort of bonkers. It’d be interesting to replicate this study with texts solely by gay men, lesbians, or transgendered people — the folks who often mess directly with society’s concepts of male and female roles — to see if it generates any different results.
The scientists don’t offer any theories as to why they these differences exist. But for me, what’s most interesting is that the words they’re focussing on — the ones that create the “fingerprint” identifying the document — are very common, throwaway words like at, she, but, or that. You wouldn’t expect such simple words to be so important in determining meaning.
Actually, almost all artificial-intelligence research into language backs this up. A decade ago, Thomas Landauer pioneered Latent Semantic Analysis — a way of automatically figuring out the “content” of a piece of writing by looking at a fingerprint of its words. Again, you’d expect that the most “important” words in a document, in terms of identifying what it’s about, would be the ones most individually freighted with meaning. For example, if you looked at this blog entry, you might think the words artificial, intelligence, gender, fiction, nonfiction, men and women would be significant. But what Landauer found is that you could strip out those big-meaning words, leaving all the other stuff behind — the buts, ands, ors, whiches, etc. — and you could still figure out what the document was about. Spooky, eh?
It’s also like the epiphany of Donald Foster — the professor who analyzes word occurrence to determine the author of texts that have been left anonymous by history. He’s the one, you may recall, who figured out that Joe Klein wrote the book Primary Colors. As he noted in his book on the subject, the words that are most revealing of one’s identity are not the high-meaning words — because those are the ones we pay attention to, and sculpt like clay. The ones that reveal our identity are the low-meaning ones — the ifs, the ands, the buts — because we use them unconsciously. They aren’t as subject to our will, and thus are a lot harder to obfuscate.
Maybe I should just stop writing blog entries in full sentences. I’ll just use pronouns and conjunctions.
“I in and the but the they or and.”
(Thanks to Rachel for pointing out this study to me!)
I'm Clive Thompson, a writer on science, technology, and culture. This blog collects bits of offbeat research I'm running into, and musings thereon.
Currently, I'm a contributing writer for the New York Times Magazine and a columnist for Wired magazine. I also write for Fast Company and Wired magazine's web site, among other places. Email or AOL IM me (pomeranian99) to say hi or send in something strange!
New technique renders objects at sea “invisible” to waves of water
Poll: Young people who use landlines are more conservative than those who use mobile phones
At Amherst college, 1% of first-year students have landlines, 99% have Facebook accounts
North Dakota the most outgoing state, according to study of “the geography of personality”
» visit the Collision Detection archives
September 26, 2008 » 01:57 PM
From an interview with ethnobotanist and anthropologist Wade Davis:
One of the cultures you celebrate in Light at the Edge of the World is the Inuit. What do you most admire about them?
Davis: The Inuit didn’t fear the cold; they took advantage of it. During the 1950s the Canadian government forced the Inuit into settlements. A family from Arctic Bay told me this fantastic story of their grandfather who refused to go. The family, fearful for his life, took away all of his tools and all of his implements, thinking that would force him into the settlement. But instead, he just slipped out of an igloo on a cold Arctic night, pulled down his caribou and sealskin trousers, and defecated into his hand. As the feces began to freeze, he shaped it into the form of an implement. And when the blade started to take shape, he put a spray of saliva along the leading edge to sharpen it. That’s when what they call the “shit knife” took form. He used it to butcher a dog. Skinned the dog with it. Improvised a sled with the dog’s rib cage, and then, using the skin, he harnessed up an adjacent living dog. He put the shit knife in his belt and disappeared into the night.
September 25, 2008 » 11:21 AM
“Video from a camp north of Toronto in December 2005 shows a car spinning around in a nearby, snow-covered parking lot. Prosecutors characterized that as special driver training but the defense, and many outsiders, said it was nothing more than “cutting doughnuts,” a favorite winter pastime of young Canadian motorists.” - A key piece of evidence submitted in the trial of a gang of alleged young Canadian terrorists.
September 24, 2008 » 11:21 PM
“Life imitates art imitating life: just thought a gnat crawling across my monitor was part of a Flash-based ad. I clicked it.” - A Tweet from Bill Braine.
September 24, 2008 » 02:37 PM
“Funniest FB friend request ever: “Twitter friend hoping to get to second base (Facebook!) ;-).”” - A recent Tweet by Pistachio
September 24, 2008 » 12:28 PM
Chinese powdered-milk crisis creates a new market: The return of the wet nurse
» see all of my photos on Flickr
ECHO
Erik Weissengruber
Vespaboy
Terri Senft
Tom Igoe
El Rey Del Art
Morgan Noel
Maura Johnston
Cori Eckert
Heather Gold
Andrew Hearst
Chris Allbritton
Bret Dawson
Michele Tepper
Sharyn November
Gail Jaitin
Barnaby Marshall
Frankly, I'd Rather Not
The Shifted Librarian
Ryan Bigge
Nick Denton
Howard Sherman's Nuggets
Serial Deviant
Ellen McDermott
Jeff Liu
Marc Kelsey
Chris Shieh
Iron Monkey
Diversions
Rob Toole
Donut Rock City
Ross Judson
Idle Words
J-Walk Blog
The Antic Muse
Tribblescape
Little Things
Jeff Heer
Abstract Dynamics
Snark Market
Plastic Bag
Sensory Impact
Incoming Signals
MemeFirst
MemoryCard
Majikthise
Ludonauts
Boing Boing
Slashdot
Atrios
Smart Mobs
Plastic
Ludology.org
The Feature
Gizmodo
game girl
Mindjack
Techdirt Wireless News
Corante Gaming blog
Corante Social Software blog
ECHO
SciTech Daily
Arts and Letters Daily
Textually.org
BlogPulse
Robots.net
Alan Reiter's Wireless Data Weblog
Brad DeLong
Viral Marketing Blog
Gameblogs
Slashdot Games