« PREVIOUS ENTRY
Location, location, location

NEXT ENTRY »
Surrealist Hiptop pix!

“He said, she said” — do men and women use different words when they write?

Do men and women use words in different ways? A group of Israeli artificial-intelligence experts think so. They crunched a bunch of English texts by men and women, both fiction and nonfiction, and looked for interesting patterns. The results? In this paper, they argue that it’s possible to figure out the gender of an author merely by paying attention to a few everyday words — and their guesses are accurate 80 per cent of the time, or higher.

For example, they discovered that in fiction, men are more likely than women to use the words a, the, and as; meanwhile, women are more likely than men to use the words she, for, with, and not. In nonfiction, men are more likely than women to use that and one. Women, however, are more likely than men to use for, with, not, and, and in.

Here’s another weird data point: Men use the pronoun he with roughly the same frequency as women, but women use the total set of all other pronouns — he, she, they, etc. — than men.

Interestingly, there are also some differences between the way everyone uses language in fiction and nonfiction. All authors — both male and female — used pronouns and negation more in fiction than nonfiction.

Did this technique make any mistakes? Yep. The professors crunched 920 English-language texts, and misclassified 12 texts, which were:

Fiction
Possession, by A. S. Byatt
The Remains of the Day, by Kazuo Ishiguro
Now We Are Thirty-Somethings, by Charles Jennings
Now Then Davos, by Martin Wiley, David Harmer, and Ian McMillan
The Seige of Krishnapur, by J. G. Farrell
A Landing on the Sun, by Michael Frayne

Nonfiction
Thank you for having me, by Maureen Lipman
A Crowd is not Company, by Robert Kee
T.S. Eliot: A Friendship, by Frederick Tomlin
Walking on Water, by Andy Martin
Unpublished Letters and manuscripts, by an Unlisted Female Author
Falling for Love: How Teenaged Mothers Talk, by Sue Sharp

As the scientists note, of the six misclassified non-fiction documents, all are biographical or diary-like. That’s intriguing, insofar as one might expect that people would write most “like” their gender when they’re writing about personal experience. Meanwhile, of the six misclassified fiction documents, all are by men, except for Possession. What’s up with that? Are these men writing “like” women? (Heh — maybe this is a subterranean reason why Jonathan Franzen freaked out so badly when Oprah picked The Corrections for her book club.) On the other hand, decades of gender theory has ably pointed out that gender is an insanely slippery thing: Men can so often act “like” women, and vice versa, that the whole idea of drawing hard lines around what’s male and what’s female is sort of bonkers. It’d be interesting to replicate this study with texts solely by gay men, lesbians, or transgendered people — the folks who often mess directly with society’s concepts of male and female roles — to see if it generates any different results.

The scientists don’t offer any theories as to why they these differences exist. But for me, what’s most interesting is that the words they’re focussing on — the ones that create the “fingerprint” identifying the document — are very common, throwaway words like at, she, but, or that. You wouldn’t expect such simple words to be so important in determining meaning.

Actually, almost all artificial-intelligence research into language backs this up. A decade ago, Thomas Landauer pioneered Latent Semantic Analysis — a way of automatically figuring out the “content” of a piece of writing by looking at a fingerprint of its words. Again, you’d expect that the most “important” words in a document, in terms of identifying what it’s about, would be the ones most individually freighted with meaning. For example, if you looked at this blog entry, you might think the words artificial, intelligence, gender, fiction, nonfiction, men and women would be significant. But what Landauer found is that you could strip out those big-meaning words, leaving all the other stuff behind — the buts, ands, ors, whiches, etc. — and you could still figure out what the document was about. Spooky, eh?

It’s also like the epiphany of Donald Foster — the professor who analyzes word occurrence to determine the author of texts that have been left anonymous by history. He’s the one, you may recall, who figured out that Joe Klein wrote the book Primary Colors. As he noted in his book on the subject, the words that are most revealing of one’s identity are not the high-meaning words — because those are the ones we pay attention to, and sculpt like clay. The ones that reveal our identity are the low-meaning ones — the ifs, the ands, the buts — because we use them unconsciously. They aren’t as subject to our will, and thus are a lot harder to obfuscate.

Maybe I should just stop writing blog entries in full sentences. I’ll just use pronouns and conjunctions.

“I in and the but the they or and.”

(Thanks to Rachel for pointing out this study to me!)


blog comments powered by Disqus

Search This Site


Bio:

I'm Clive Thompson, a writer on science, technology, and culture. This blog collects bits of offbeat research I'm running into, and musings thereon.

Currently, I'm a contributing writer for the New York Times Magazine and a columnist for Wired magazine. I also write for Fast Company and Wired magazine's web site, among other places. Email or AOL IM me (pomeranian99) to say hi or send in something strange!

More of Me

Twitter
Tumblr
Flickr


Recent Entries

New technique renders objects at sea “invisible” to waves of water

Poll: Young people who use landlines are more conservative than those who use mobile phones

At Amherst college, 1% of first-year students have landlines, 99% have Facebook accounts

North Dakota the most outgoing state, according to study of “the geography of personality”

Why the next wave of high-tech CEOs will be as old as your parents: My latest column in Wired magazine

» visit the Collision Detection archives

Clive Thompson's Tumblr
a bunch of stuff

September 26, 2008 » 01:57 PM

From an interview with ethnobotanist and anthropologist Wade Davis:

One of the cultures you celebrate in Light at the Edge of the World is the Inuit. What do you most admire about them?

Davis: The Inuit didn’t fear the cold; they took advantage of it. During the 1950s the Canadian government forced the Inuit into settlements. A family from Arctic Bay told me this fantastic story of their grandfather who refused to go. The family, fearful for his life, took away all of his tools and all of his implements, thinking that would force him into the settlement. But instead, he just slipped out of an igloo on a cold Arctic night, pulled down his caribou and sealskin trousers, and defecated into his hand. As the feces began to freeze, he shaped it into the form of an implement. And when the blade started to take shape, he put a spray of saliva along the leading edge to sharpen it. That’s when what they call the “shit knife” took form. He used it to butcher a dog. Skinned the dog with it. Improvised a sled with the dog’s rib cage, and then, using the skin, he harnessed up an adjacent living dog. He put the shit knife in his belt and disappeared into the night.

September 25, 2008 » 11:21 AM
“Video from a camp north of Toronto in December 2005 shows a car spinning around in a nearby, snow-covered parking lot. Prosecutors characterized that as special driver training but the defense, and many outsiders, said it was nothing more than “cutting doughnuts,” a favorite winter pastime of young Canadian motorists.” - A key piece of evidence submitted in the trial of a gang of alleged young Canadian terrorists.

September 24, 2008 » 11:21 PM
“Life imitates art imitating life: just thought a gnat crawling across my monitor was part of a Flash-based ad. I clicked it.” - A Tweet from Bill Braine.

September 24, 2008 » 02:37 PM
“Funniest FB friend request ever: “Twitter friend hoping to get to second base (Facebook!) ;-).”” - A recent Tweet by Pistachio

September 24, 2008 » 12:28 PM
Chinese powdered-milk crisis creates a new market: The return of the wet nurse

» visit my Tumblr

Recent Comments

Photos

» see all of my photos on Flickr

Collision Detection: A Blog by Clive Thompson