top of page
Writer's pictureSheelagh Caygill

Debunking the study that AI generates better poetry than poetry written by humans

Updated: 11 minutes ago


Poetry fans enjoy a poetry reading
An understanding and appreciation of poetry come with time. And the time invested may eventually produce expertise. Here, poetry fans enjoy a poetry reading.

A new report from the University of Pittsburgh has given birth to the fiction that generative artificial intelligence (gen AI) can generate better poetry than poetry written by humans.


This study, "AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably”, was published in late October 2024, and conducted by Brian Porter, a post-doctoral researcher, and Edouard Machery, a philosopher and professor in the Department of History and Philosophy of Science at the University of Pittsburgh. Porter works at the University of Pittsburgh, too. 


Debunking the study saying AI poetry is better than human poetry


Unfortunately, the report received a huge amount of global media coverage. More than 2,200 links from other websites and more than 500 unique domains point to the study, according to ahrefs.com, the second most active web crawler after Google.


The problem is, the report is incorrect and full of flaws.


Now, millions of people now believe that generative AI poetry is more liked than human-written poetry. This flawed headline will be quoted for months and perhaps years to come as though it’s a fact.

Does debunking the AI poetry study matter? Yes, and here’s why:


  1. Many lesser-known writers are under-appreciated and underpaid and have been for decades; in this context, I believe the study will discredit poets, poetry, and writers.

  2. Elected officials and lobbyists opposed to a liberal arts education or poetry courses will use the report to argue against funding poetry and English courses.

  3. The longer-term consequences could impact poetry publishers, festivals, and community organizations around the world.



A closer look at the study 'AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably'


Study participants were unfamiliar with poetry


Porter and Machery’s study participants reported low levels of experience with poetry, with 90.4% of participants reporting they read poetry a few times per year or less; 55.8% described themselves as “not very familiar with poetry”, and 66.8% described themselves as “not familiar at all” with their assigned poet. 


No fact-checking


In this age of fake news, AI hallucinations, dis-, and misinformation, it’s not unreasonable to expect that a bold new statement about AI’s capabilities be fact-checked by anyone who intends to publish the study results or share the piece on social media.


But, as far as I can tell, not even a tiny number of journalists or columnists looked closely at the report. (I’m sure there must be some, and if you find any, please let me know). Being curious is—or maybe I should say “was”—one of the essential qualities of a reporter. Sadly, in newsrooms around the world, resources have been cut to the bone, so a journalist with the time to ask questions is a luxury long gone. 


Participants' goal is to earn money


Porter and Machery recruited study participants from Prolific.com, an online research platform connecting researchers with people willing to complete surveys for money. 


Prolific pays participants to complete surveys, and people should be paid for their time; but most of the people registered with Prolific are motivated to earn money with minimal time and effort. Their goal isn’t to deliver thoughtful and well-written responses (1). Instead, they:


  • Prioritize high-paying studies

  • Complete shorter studies to fit more into their day

  • Set alerts for new, high-paying surveys (2).


Porter and Machery recognize this and, in fact, suggest that participants used "flawed" methods so that they could evaluate poems in the shortest amount of time. 


Older and more complex poetry


Chaucer, Shakespeare, Butler, Byron, Whitman, Dickinson, TS Eliot, Plath, Ginsberg, and Dorothea Lasky wrote the poems used in the survey. These poets wrote/write some complex poems, though in different ways. Most of their work requires more than one reading to understand and appreciate it. 


Chaucer and Shakespeare wrote in the 14th and 16th centuries, respectively, with Chaucer writing before the Great Vowel Shift (GVS) and Shakespeare writing during the GVS. Since Shakespeare's death in 1616, English spelling and pronunciation have undergone significant changes.


Only one poet in the survey is alive today


Lasky is the only poet alive and writing today whose work was used in the study. We're almost one-quarter of the way through the 21st century, so why are there no young and contemporary poets used in the study?


White, mostly men, from England and the USA


All the poets in the study are white, and three are women. Four are from England, and the remaining six are from the USA. Where are the African, Asian, Australasian, European, South American, or Canadian poets? Where are the nonbinary and trans poets? The teenage poets?


Survey participants from the US only


The participants were from the US, which admittedly is a large country population-wise, with more than 335 million people. But it doesn't represent the cultures, diversity, or richness of countries around the world.


AI poems are simpler


Porter and Machery are quoted as saying that non-expert readers of poetry prefer AI-generated poems because they find them straightforward and accessible, and there's nothing wrong with thissome of the best-known and most beautiful poems are simple and easy to read, such as "Nothing Gold Can Stay", by Robert Frost, or "This is Just to Say", by William Carlos Williams".


Equally, human poetry rewards deep study and analysis. In fact, Porter and Machery told The Guardian that poetry by humans delivers rich rewards "in a way that AI-generated poetry may not." (3). They continued: "It's the complexity and opacity of human-written poetry that makes it so appealing."

Ease of reading isn’t typically considered a primary criterion for evaluating poetry, yet most news coverage focused on exactly this when covering the story.


Misplaced trust in AI


One of the report's findings was that study participants: "Trust that AI will not generate imitations of human experience". (3) This surprised me because it revealed that so many people still don’t know how generative AI works, or understand the motivations of AI creators. Unfortunately, many lack the curiosity to find out. This should tell educators and governments that it's time to educate users of AI so they understand the technology, its limitations, and how best to use it. Maybe OpenAI (ChatGPT's owner) can fund this education.


Only three people have written about the study's flaws


I've searched extensively for articles that have examined the study closely, but so far have found only three. There must be more, and if you find them, please send me the links.


  1. Futurist and historian Brad Berens explains on the Centre for the Digital Future that the study is biased because it framed the participants' experience "as an identification exercise [so that] Porter and Machery biased readers into thinking in terms of typicality rather than uniqueness, and typicality is where Generative AI excels."

    Here's the link to Berens' article, and here's his profile on LinkedIn.


  1. Ben Feuer, a PhD candidate in deep learning, wonders why the researchers would think that hourly workers on Prolific would be capable of, or motivated to provide, meaningful judgments on the quality of poetry. Here's a link to Feuer's piece on LinkedIn.


  1. On the LitHub website, Jen Benka, poet and former president and executive director of the Academy of American Poets, points out that: “Reports like these are important to investigate as they have practical and potentially serious consequences.” See her piece: On the Report of Poetry’s Death, or: What Does That AI Poetry Study Really Tell Us?


This study doesn't reveal anything new


Almost 100 years ago a lecturer at the University of Cambridge discovered what Porter and Machery's study revealed: Readers find it difficult to attribute or evaluate poetry without context. The lecturer was poet and critic I.A. Richards. In his book Practical Criticism, Richards wrote about experiments he conducted with his Cambridge undergraduates in the 1920s, and these formed the basis for his insights on blind comparison and reader attribution.


Passion, time, and expertise


Imagine someone who's not an expert in golf watching a golf ball fly from a tee and travel 400 yards through the air, landing less than five centimeters from the hole (cup). Could they say if the ball had been hit by a pro golfer or a robot? Most likely not. So why are people surprised to learn that study participants with limited expertise in poetry can't say whether a poem has been generated by AI or created by a human?


Passion matters. Anyone reading this will know that a passion for something—say, vinyl records, tennis, wine, rugby, classic cars, vintage fountain pens, or Irish poets—demands time. Appreciation and mastery are a matter of study. That’s part of the pleasure of the passion. Understanding and improving are measured in months and years. Sometimes, if progress feels too slow, frustration can arise—and again, anyone passionate about an interest realize that's all part of the richness of the experience.


Footnotes:

22 views0 comments

Recent Posts

See All

Comments


bottom of page