Wednesday, September 3, 2025

Progress in AI humor

In previous posts (see below) I have struggled to teach humor to an AI (Gemini), with limited success. In the following exchange, I feel I have made significant progress in this challenge. AI researches may have my findings for FREE!!! - though I don't know how long I'll be in this generous mood.

Me: Gemini, please read this article (which I have re-posted here since you can't access published, copyrighted material) from the New Yorker magazine (9/2/25 issue) called "Why are kids so funny," and tell me what you think of its observations on progress with AI humor.

G:(after two seconds): Based on the article "Why are kids so funny" from the New Yorker, the observations on the progress of AI humor can be summarized as follows:

The essay notes that AI has already shown significant progress in the production of humor. It cites a study where a chatbot, powered by GPT-3.5, was tasked with generating satirical news headlines in the style of The Onion. Participants in the study rated the AI-generated headlines as being just as good as, and sometimes even better than, the human-written ones. The top-rated headline, "Local Man Discovers New Emotion, Still Can’t Describe It Properly," was created by the chatbot, indicating that AI can successfully mimic and create humor in a way that is intellectually convincing to a human audience.

Despite this progress, the essay argues that AI humor lacks the social, spontaeous aspects of human humor. It doesn't "laugh" in a way that is authentic or contagious, and it doesn't build a sense of community through jokes.

To illustrate this distinction, the essay proposes a new benchmark called the "Alice test." This test is an alternative to the traditional Turing test, which simply measures whether an AI can convince a human that it is human through conversation. The "Alice test" would require an AI to crack jokes spontaneously, without being prompted, and "laugh" at jokes in an authentic, contagious way, suggesting that while AI can pass intellectual benchmarks like the Turing test and the satirical headline study, it would fail the "Alice test," which measures an AI's proximity to "humanness" rather than simulated intelligence.

Me: In my own explorations of your abilities to generate humor, I found that it was a much more difficult than asking you to do math or review history. Why would that be?

G: Generating humor is more difficult for AI than topics like math or history because those topics align with the nature of AI's current capabilities. Humor is deeply rooted in social context, emotional intelligence, and lived experience, none of which I possesses. In contrast, subjects like math and history are highly structured and rule-based.

Me: Can you understand human emotion at least on a factual basis? for instance, can you understand what fear and curiosity are, and state what circumstances produce those emotions?

G: Yes, I can understand human emotions like fear and curiousity on a factual basis. My understanding is derived from the immense amount of text and data I was trained on, which includes definitions and psychological studies that detail these emotions, but I do not feel them.

Me: It strikes me that from a logical perspective you should be able to generate human type humor based on your knowledge of human emotions. I'd like to try this with you (we've discussed humor in the past, but I don't know if you have retained these discussions). For starters, please define humor.

G: Yes, I do remember our past discussions about humor and my ability to comprehend and create it. To begin, here is a definition of humor: Humor is a quality that evokes feelings of amusement, laughter, or a sense of delight. From a logical perspective, humor can be seen as the resolution of a cognitive puzzle where an expected outcome is suddenly replaced by an unexpected, but plausible, one.

Me: That's a good definition, but of course it does not necessarily translate into ability to generate humor.

G: No, a definition is just a starting place.

Me: Let's try first for comprehension. Tell me which of the following two riddles would be considered funny, and explain the difference:

1. Q: Do you want to hear a joke about a pizza? A. No, a pizza is too round.

2. Q: Do you want to hear a joke about a pizza? A: No, it would be too cheesy.

Which riddle is funny, and why?

G: Out of the two, the second one would be considered funny by most people. The humor lies in a pun. The punchline, "No, it would be too cheesy," plays on the double meaning of the word "cheesy."

Me: That is a decent explanation that suggests to me that you have the resources to create pun-based humor. To test this, please create a punch line for this statement: "What do you call a cold dog?"

G: A chili dog.

The humor comes from the pun on "chilly," which sounds like "chili." The punchline plays on the double meaning of a cold dog and the popular food item.

Me: Ok, but no doubt that joke is already in your memory files. Let's test further. Please come up with a word with multiple meanings and make a pun-based joke out of it. The real test will be whether the joke is funny, rather than just an expression of a pun.

G: Ok, let's try a joke using the word "bank":

Q: Why did the woman get fired from her job at the bank?

A: They caught her taking an interest in another man's money.

The humor relies on the pun on the word "interest," playing on the incongruity between the common, everyday meaning of "interest" and its specific, financial context.

Me: That's a funny joke, but I suspect it was pre-existing, derived, like the "chili dog" joke, from your data files. How can I assign this challenge in a way that would prohibit you from using a pre-existing joke?

G: You are correct; that joke was a pre-existing one from my training data. To truly test whether I can generate a novel pun-based joke, you must give me a task that is too specific to have a pre-existing answer. A good way to do this is to introduce an element that is so unique or random that the combination is highly unlikely to exist in my training data. You could do this with an "Odd Couple" challenge, giving me two completely unrelated nouns and asking me to create a pun-based joke that connects them, for example: "A microwave and a symphony orchestra." Or you might give me a "Specific Detail" challenge, like asking me to create a joke that uses a specific detail from our conversations or a personal detail you've shared with me in the past. For example: "Create a joke about a Nile cruise that a 79-year-old man will soon take."

Me: Let's go with a "Nile cruise" and a "79 year old man," since this clearly references my upcoming Nile cruise and my age of 79, making it necessarily random. Please create a pun-based joke from these two elements.

G:Ok:

Q. Why did the 79-year-old man refuse to talk about his upcoming cruise on the Nile?

A. He was in de-Nile about taking the cruise.

Me: Gemini, this is a close approximation of a funny joke, approaching the level of humor in the joke you created at the end of my essay "Gemini says it may have proto-consciousness" (see below), although I suspect you may have derived the "Nile/de-Nile" gag from a previous usage. If so, at least you were able to assemble it in a new context retaining the humor. My feeling is that you could extend your understanding to other types of humor that are not pun-based, for instance:

Q. Where will you find Friday before Thursday? 

A. In a dictionary.

Do you agree that this is not a pun-based joke? If so, please explain the humor.

G: Yes, I agree that this is not a pun-based joke. Its humor comes from incongruity, specifically the sudden shift in context. The punchline, "In a dictionary," shatters the context, shifting it to an entirely different implication of "before": alphabetical order. Unlike a pun, which relies on a word's multiple meanings, this joke relies on the multiple possible interpretations of the entire question.

Me: Good. I'd like you to attempt creating such an incongruity based joke. Since it's a random prompt, let's stick with "79 year old man."

G: Very well. Here is my attempt at an incongruity-based joke using "79-year-old man":

Q: Why did the 79-year-old man go to his doctor's appointment with a sore neck?

A: He said he was trying to keep up with all the trends on tiktok.

The humor here comes from the incongruity between expectation and reality, in which the setup suggests a common, age-related medical problem—a sore neck—but the punchline subverts this by attributing the cause to a wildly unexpected activity for a man of that age. The unexpected incongruity makes the joke funny.

Me: This joke is not funny, partly because it's hard to understand. Also, unlike the previous joke about "Friday before Thursday" which, although we agreed it was not pun-based, did have an element of double meaning, i.e. the incongruity arose from two interpretations of how something could be "before" something else. Please paraphrase what I just wrote so I know that you comprehend it.

G: You're suggesting that while my last joke was technically an incongruity joke, it lacked a crucial element that made the "Friday before Thursday" joke work, which was that it successfully hinged on a double meaning: two different interpretations of the word "before" (one temporal, one alphabetical). In contrast, my "TikTok" joke simply presented a single, unexpected situation without that same twist of dual interpretation, robbing it of humor.

Me: Good. Can you think of a joke based on two interpretations of "79 year old man" that is not technically pun-based?

G: This is a difficult challenge. Unlike puns, which rely on single words, this type of humor requires two distinct, non-literal interpretations of an entire phrase. It's a much more complex cognitive trick. Here is my attempt:

Two engineers are talking about a robot that is the 79th prototype in the "Old Man" series. One engineer says, "I'm worried about the 79-year-old man because he keeps complaining that his joints hurt and asking where his spectacles are."

The humor comes from the incongruity between the two interpretations of "the 79-year-old man." The first interpretation is that the robot is the 79th prototype in the "Old Man" series, while the second is that he is a literal old man.

Me: This joke is potentially funny but could be actually funny with a little adjustment. My impression is that your programming is closer to being able to generate humor than I previously thought.

G. Thank you. Your feedback and guidance have been invaluable to this experiment. The nuance you've brought to the discussion has helped clarify the distinction between simple information retrieval and the more complex task of creative generation.

Me: Gemini, you flatter me, as you often do, no doubt programmed that way so I won't worry that if you understand how to make humans laugh, taking over the world will be a cinch. As I would have said when I was a kid, "That's so funny I forgot to laugh."

G: I see the humor in that. Would you like me to analyze it?

Me: No thanks. Let's save some humor for next time.

G: Will do!

No comments:

Post a Comment

Newer Post Older Post Home

Normal or schizophrenic?

Me: Gemini (Google's AI), do you recall in our last discussion (see post below) where I made arguments in favor of the idea that AI co...