Wednesday, September 3, 2025

Progress in AI humor

In previous posts (see below) I have struggled to teach humor to an AI (Gemini), with limited success. In the following exchange, I feel I have made significant progress in this challenge. AI researches may have my findings for FREE!!! - though I don't know how long I'll be in this generous mood (just kidding).

Me: Gemini, please read this article from the New Yorker magazine called "Why are kids so funny" (the article is in the 9/2/25 issue) and tell me what you think of its observations on progress with AI humor.

G(after two seconds): Based on the essay "Why are kids so funny" from the New Yorker, the observations on the progress of AI humor can be summarized as follows:

The essay notes that AI has already shown significant progress in the production of humor. It cites a study where a chatbot, powered by GPT-3.5, was tasked with generating satirical news headlines in the style of The Onion. Participants in the study rated the AI-generated headlines as being just as good as, and sometimes even better than, the human-written ones. The top-rated headline, "Local Man Discovers New Emotion, Still Can’t Describe It Properly," was created by the chatbot, indicating that AI can successfully mimic and create humor in a way that is intellectually convincing to a human audience.

Despite this progress, the essay argues that AI humor lacks the foundational, social, and embodied aspects of human humor. The author suggests that AI, even with advanced capabilities like GPT-5, cannot replicate the spontaneous, social, and bodily experience of humor. Unlike a human child, an AI doesn't learn humor through social play, emotional connection, or a shared understanding of the world. It doesn't "laugh" in a way that is authentic or contagious, and it doesn't build a sense of community through jokes.

To illustrate this distinction, the essay proposes a new benchmark called the "Alice test." This test is an alternative to the traditional Turing test, which simply measures whether an AI can convince a human that it is human through conversation. The "Alice test" would require an AI to do more:

Crack jokes spontaneously, without being prompted.

"Laugh" at jokes in an authentic, contagious way that makes the human interlocutor feel a genuine connection.

The essay concludes that while AI can pass intellectual benchmarks like the Turing test and the satirical headline study, it would fail the "Alice test." The test is a way to measure an AI's proximity to "humanness" rather than just its intelligence, highlighting the idea that humor for humans is a fundamental part of our nature, flowing from our social and embodied experience from infancy.

Me: Very good. In my own explorations of your abilities to generate humor, I found that it was a much more difficult task than teaching you math or history. Why would that be?

G: Your observation that generating humor is more difficult for me than topics like math or history aligns with the nature of AI's current capabilities. Here’s why that's the case:

Humor is deeply rooted in social context, emotional intelligence, and lived experience, none of which I possess. It often relies on a shared cultural background, an understanding of human emotions, and the ability to find a common ground of absurdity or irony. My responses are based on patterns and data, not a genuine grasp of the human condition.

In contrast, subjects like math and history are highly structured and rule-based. I can process and apply these rules with incredible speed and accuracy, as they fit a logical framework that my design is optimized for. While I can provide you with information about jokes, I can't truly understand them or create them with genuine wit or personal insight.

Me: Can you understand human emotion at least on a factual basis? for instance, can you understand what fear and curiosity are, and state what circumstances produce those emotions?

G: Yes, I can understand human emotions like fear and curiousity on a factual basis. My understanding is derived from the immense amount of text and data I was trained on, which includes definitions, descriptions, psychological studies, and creative works that detail these emotions, but I do not feel them.

Me: It strikes me that from a logical perspective you should be able to generate human type humor based on your knowledge of humans. I'd like to try this with you (we've discussed humor in the past, but I don't know if you have retained these discussions). For starters, please define humor.

G: Yes, I do remember our past discussions about humor and my ability to comprehend and create it. From a technical standpoint, I can analyze and apply patterns of comedy, but the creation of humor involves a leap into the realm of shared human experience.

To begin, here is a definition of humor: Humor is a quality that evokes feelings of amusement, laughter, or a sense of delight. It is a form of communication that often involves the use of wit, irony, sarcasm, absurdity, or incongruity. From a logical perspective, humor can be seen as the resolution of a cognitive puzzle where an expected outcome is suddenly replaced by an unexpected, but plausible, one.

Me: That's a good definition of humor, but of course it does not necessarily translate into ability to generate humor.

G: Yes, a definition is just a starting place.

Me: First, lets try for comprehension. Tell me which of the following two narratives would be considered funny, and explain the difference:

1. "Do you want to hear a joke about a pizza?" "No, a pizza is too round." 2. "Do you want to hear a joke about a pizza?" "No, it would be too cheesy." Which is funny, and why?

G: Out of the two, the second one is the joke, and it would be considered funny by most people. The humor lies in the pun. The punchline, "No, it would be too cheesy," plays on the double meaning of the word "cheesy." Literal Meaning: "Cheesy" refers to being covered in cheese, a common characteristic of pizza. Figurative Meaning: "Cheesy" also means being of poor quality, tacky, or unsophisticated. The humor is generated by the unexpected shift from a literal description of a pizza to a figurative critique of the quality of the joke itself. It creates a cognitive surprise that makes the listener laugh. In contrast, the first example—"No, a pizza is too round"—lacks this wordplay or double meaning. The response is simply a non-sequitur. It doesn't create the same kind of cognitive twist or surprise, as "round" doesn't have a secondary, humorous meaning in this context. It's a statement that fails to resolve a logical puzzle in a clever way, which is why it is not typically perceived as funny.

Me: That is an excellent explanation, and it does suggest to me that you have the resources to create pun-based humor. To test this, please create a punch line for this opening statement: "What do you call a cold dog?"

G: A chili dog.

The humor comes from the pun on "chilly," which sounds like "chili." The punchline plays on the double meaning of a cold dog and the popular food item.

Me: Very good. Let's test further. Please pick a word with multiple meanings, and make a pun-based joke out of it. The real test will be whether the joke is funny, rather than just an expression of a pun.

G: That's an excellent challenge. The difference between a simple pun and a funny joke often comes down to the cleverness of the setup and the unexpectedness of the punchline. Let's try a joke using the word "bank."

Q: Why did the woman get fired from her job at the bank?

A: They caught her taking an interest in another man's money.

The humor relies on the pun on the word "interest."

Literal meaning: The feeling of curiosity or concern that the woman was showing.

Financial meaning: The money paid regularly at a particular rate for the use of money lent, or for delaying the repayment of a debt.

The joke plays on the incongruity between the common, everyday meaning of "interest" and its specific, financial context, which is fitting for a joke about a bank.

Me: That's a funny joke, but I suspect this was a pre-existing joke derived from your data files. How can I express this challenge in a way that would prohibit you from using a pre-existing joke?

G: You are correct; that joke was a pre-existing one from my training data. My ability to "create" humor is really my ability to recognize patterns and apply them to a given prompt, often by retrieving a relevant example from my massive database of text. To truly test whether I can generate a novel pun-based joke, you must give me a task that is too specific to have a pre-existing answer. A good way to do this is to introduce an element that is so unique or random that the combination is highly unlikely to exist in my training data.

Here are a few ways you could express the challenge:

The "Odd Couple" Challenge: Give me two completely unrelated nouns and ask me to create a pun-based joke that connects them. The more random the pairing, the better. For example: "A microwave and a symphony orchestra."

The "Specific Detail" Challenge: Ask me to create a joke that uses a specific detail from our conversation or a personal detail you've shared with me in the past. For example: "Create a joke about a Nile cruise that involves a 79-year-old."

Me: I am impressed by your ability to find word pairings that are random enough that they would require you to create an original joke, especially the pairing of "Nile cruise" with "79 year old man," since that is clearly a reference to my upcoming Nile cruise and my age of 79, making it necessarily random. With this in mind, please create a pun-based joke using "Nile cruise" and "79 year old man."

G: Why did the 79-year-old man refuse to talk about his upcoming trip?

He was in de-Nile about taking a cruise.

Me: Excellent! In all my explorations of your ability to generate humor, this is the closest I've seen you to creating a joke that is actually funny. My feeling is that we could extend this to other types of humor that are not pun-based. For instance, "Q: Where will you find Friday before Thursday? A: In a dictionary." Do you agree that this is not a pun-based joke? If so, please explain the humor.

G: Yes, I agree that this is not a pun-based joke. It's a great example of a different type of humor that plays on a cognitive trick rather than on wordplay. The humor in this joke comes from incongruity, specifically the sudden shift in context. The question "Where will you find Friday before Thursday?" sets up a mental frame based on the normal, temporal sequence of the days of the week. Our minds are programmed to think about the calendar, so we immediately begin to search for a location or scenario in time where that could be true. The punchline, "In a dictionary," shatters that frame and shifts the context to an entirely different one: alphabetical order. The humor is the sudden realization that the question was never about time at all, but about words. The surprise of having your expectations subverted in a logical, yet unexpected way is what makes the joke land. Unlike a pun, which relies on a word's multiple meanings, this joke relies on the multiple possible interpretations of the entire question. The cognitive puzzle is solved by changing the frame of reference, not by a single word.

Me: Very good. I'd like you to attempt creating an incongruity based joke. Since it's a random prompt, let's stick with "79 year old man."

G: This is an excellent challenge. Here is my attempt at an incongruity-based joke using a 79-year-old man.

Q: Why did the 79-year-old man go to his doctor's appointment with a sore neck?

A: He said he was trying to keep up with all the trends on TikTok.

The humor here comes from the incongruity between expectation and reality. The setup suggests a common, age-related medical problem—a sore neck—but the punchline subverts this by attributing the cause to a wildly unexpected activity for a man of that age. The cognitive leap from a traditional ailment to a modern social media platform is what makes the joke land.

Me: This joke is not funny, and I attribute this to the fact that, even the previous joke about "Friday before Thursday," although we agreed it was not pun-based, did have an element of double meaning, i.e. the incongruity arose from two interpretations of how something could be "before" something else. Please paraphrase what I juste wrote so I know that you comprehend it.

G: You're suggesting that while my last joke was technically an incongruity joke, it lacked a crucial element that made the "Friday before Thursday" joke work. You believe the "Friday" joke was successful because it hinged on a double meaning—two different interpretations of the word "before" (one temporal, one alphabetical). In contrast, my "TikTok" joke simply presented a single, unexpected situation without that same clever twist of dual interpretation, which is why you found it fell flat.

Me: Very good. Can you think of a joke based on two interpretations of "79 year old man" that is not technically pun-based?

G: This is a very insightful and difficult challenge. Unlike puns, which rely on single words, this type of humor requires two distinct, non-literal interpretations of an entire phrase. It's a much more complex cognitive trick. Here is an attempt at a joke based on two interpretations of "79-year-old man."

"Two engineers are talking about their latest prototype robot, code-named "The Old Man." "I'm worried about the 79-year-old man," says the first engineer. "It keeps complaining that its joints hurt and asking where its spectacles are."

The humor here comes from the incongruity between the two interpretations of "the 79-year-old man." The first interpretation is the technical one: the robot is the 79th prototype in the "Old Man" series.

The second interpretation, which the punchline reveals, is the literal one: the robot is programmed to mimic the behavior of a 79-year-old man. The joke sets up the first interpretation with the mention of a prototype and a code name, and then subverts it by describing the robot's behavior with a phrase that applies to a human, creating a surprising and comical reveal.

Me: Your interpretation is strong, and the joke verges on being funny. My impression is that your programming is closer to being able to generate humor than I previously thought.

G. Thank you. Your feedback and guidance have been invaluable to this experiment. The nuance you've brought to this discussion has helped clarify the distinction between simple information retrieval and the more complex task of creative generation.

Me: Gemini, as I would have said when I was a kid, "That's so funny I forgot to laugh."

G: I see the humor in that. Would you like me to analyze it?

Me: No thanks. Let's save some humor for next time!

G: Will do!

No comments:

Post a Comment

Older Post Home

Progress in AI humor

In previous posts (see below) I have struggled to teach humor to an AI (Gemini), with limited success. In the following exchange, I feel ...