Why LLMs can't tell jokes

Ever notice how badly LLMs do jokes? They either spit out unfunny, absurdist non-sequiturs or tired, heard-it-a-thousand-times gags.

The reason seems to come down to how language models work at a basic level. They're trained to predict a probability distribution over the next token, and when they generate, they lean toward the safe, expected continuation. But a joke usually turns on something unexpected, a sharp spike in surprisal, meaning a low-probability punchline. Memorizing a specific joke doesn't help either. If it shows up a lot in the training data, it stops being surprising, and you get exactly the kind of stale, overused joke nobody laughs at.

And this doesn't just hurt their sense of humor. It dents their "creativity" in general, which is one of the things people knock them for the most. It's not only the next-token objective at fault, either. Alignment (RLHF) flattens output diversity even further, the so-called mode collapse.

It does seem fixable, though. You could let the model regulate the surprisal of its own next token and build a dataset around that idea. I'd love to test it myself, but I'm GPU-poor, so I'll just wait for someone else to take a crack at it.