Can AI ‘feel’ guilt?

July 29, 2025 at 7:01 pm

Some sci-fi scenarios depict robots as cold-hearted clankers eager to manipulate human stooges. But that’s not the only possible path for artificial intelligence.

Humans have evolved emotions like anger, sadness and gratitude to help us think, interact and build mutual trust. Advanced AI could do the same. In populations of simple software agents (like characters in “The Sims” but much, much simpler), having “guilt” can be a stable strategy that benefits them and increases cooperation, researchers report July 30 in Journal of the Royal Society Interface.

Emotions are not just subjective feelings but bundles of cognitive biases, physiological responses and behavioral tendencies. When we harm someone, we often feel compelled to pay a penance, perhaps as a signal to others that we won’t offend again. This drive for self-punishment can be called guilt, and it’s how the researchers programmed it into their agents. The question was whether those that had it would be outcompeted by those that didn’t, say Theodor Cimpeanu, a computer scientist at the University of Stirling in Scotland, and colleagues.

The agents played a two-player game with their neighbors called iterated prisoner’s dilemma. The game has roots in game theory, a mathematical framework for analyzing multiple decision makers’ choices based on their preferences and individual strategies. On each turn, each player “cooperates” (plays nice) or “defects” (acts selfishly). In the short term, you win the most points by defecting, but that tends to make your partner start defecting, so everyone is better off cooperating in the long run. The AI agents couldn’t feel guilt as richly as humans do but experienced it as a self-imposed penalty that nudges them to cooperate after selfish behavior.

The researchers ran several simulations with different settings and social network structures. In each, the 900 players were each assigned one of six strategies defining their tendency to defect and to feel and respond to guilt. In one strategy, nicknamed DGCS for technical reasons, the agent felt guilt after defecting, meaning that it gave up points until it cooperated again. Critically, the AI agent felt guilt (lost points) only if it received information that its partner was also paying a guilt price after defecting. This prevented the agent from being a patsy, thus enforcing cooperation in others. (In the real world, seeing guilt in others can be tricky, but costly apologies are a good sign.)

The simulations didn’t model how guiltlike behavior might first emerge — only whether it could survive and spread once introduced. After each turn, agents could copy a neighbor’s strategy, with a probability of imitation based on neighbors’ cumulative score. In many scenarios — particularly when guilt was relatively low-cost and agents interacted with only their neighbors — DGCS became the dominant strategy, and most interactions became cooperative, the researchers found.

We may want to program the capacity for guilt or other emotions into AIs. “Maybe it’s easier to trust when you have a feeling that the agent also thinks in the same way that you think,” Cimpeanu says. We may also witness emotions — at least the functional aspects, even if not the conscious ones — emerge on their own in groups of AIs if they can mutate or self-program, he says. As AIs proliferate, they could comprehend the cold logic to human warmth.

But there are caveats, says Sarita Rosenstock, a philosopher at The University of Melbourne in Australia who was not involved in the work but has used game theory to study guilt’s evolution. First, simulations embody many assumptions, so one can’t draw strong conclusions from a single study. But this paper contributes “an exploration of the possibility space,” highlighting areas where guilt is and is not sustainable, she says.

Second, it’s hard to map simulations like these to the real world. What counts as a verifiable cost for an AI, besides paying actual money from a coffer? If you talk to a present-day chatbot, she says, “it’s basically free for it to say I’m sorry.” With no transparency into its innards, a misaligned AI might feign remorse, only to trespass again.