Memory as Surprisal Analysis: Dopamine and What Gets Remembered

Dec 5, 2025 Neuroscience Information Theory Machine Learning

I came across a statement from Parmita Mishra that stopped me: your memory is literally just a surprisal analysis of your cognition. That's all it is. And dopamine is just the way of measuring the stuff that's going to be remembered.

This reframing hit me hard because it connects information theory, neuroscience, and machine learning in a way that makes memory feel less mysterious and more computational. If memory is surprisal analysis, then what we remember isn't random. It's determined by how unexpected, how surprising, an experience was relative to our existing model of the world.

What is Surprisal?

In information theory, surprisal measures how unexpected an event is. The more surprising an event, the more information it carries. Mathematically, surprisal is defined as the negative logarithm of the probability of an event:

$$I(x) = -\log P(x)$$

where $I(x)$ is the surprisal of event $x$ and $P(x)$ is its probability. If something is highly probable, its surprisal is low. If something is unexpected, its surprisal is high.

Think about it this way. If you walk into your kitchen every morning and see your coffee maker, that's not surprising. You don't remember each individual instance of seeing it. But if you walk into your kitchen and see a bear, that's extremely surprising. You'll remember that moment vividly.

Memory as Prediction Error

The connection to memory becomes clear when we think about the brain as a prediction machine. Your brain constantly generates predictions about what will happen next. When reality deviates from these predictions, you get a prediction error. The larger the prediction error, the more surprising the event, and the stronger the signal to remember it.

This isn't just philosophical. There's a computational mechanism at play. Your brain maintains an internal model of the world. When sensory input arrives, it's compared against predictions generated by this model. The difference between prediction and reality is the prediction error, which directly relates to surprisal.

Events with high surprisal violate your expectations. They contain information that your current model doesn't account for. From an information-theoretic perspective, these are the events worth encoding into memory because they update your model of how the world works.

Dopamine as the Measurement

This is where dopamine enters the picture. Dopamine release is tightly coupled with prediction errors. When something unexpected happens, dopamine neurons fire. The magnitude of the dopamine response correlates with the magnitude of the prediction error.

But dopamine isn't just signaling surprise. It's signaling what should be remembered. High dopamine release during an event strengthens the synaptic connections involved in encoding that memory. Low dopamine, or the absence of prediction error, means the event aligns with expectations. There's nothing new to learn, so there's no strong signal to remember it.

This creates a beautiful computational loop. Your brain predicts, compares prediction to reality, computes prediction error (surprisal), releases dopamine proportional to that error, and uses that dopamine signal to determine memory strength. The more surprising an event, the more dopamine, the stronger the memory.

The Computational View

From a machine learning perspective, this makes perfect sense. In reinforcement learning, we use prediction errors to update value functions and policies. Events with high prediction errors are more informative and should be weighted more heavily in learning.

Consider a simple model where memory encoding strength $M$ is a function of prediction error $\delta$:

$$M = f(\delta) = f(r - \hat{r})$$

where $r$ is the actual outcome and $\hat{r}$ is the predicted outcome. When $\delta$ is large, $M$ is large, and the memory is encoded strongly. When $\delta$ is small or zero, $M$ is small, and the memory is weak or not encoded at all.

Dopamine essentially implements this function. High prediction error leads to high dopamine, which leads to strong memory encoding. Low prediction error leads to low dopamine, which leads to weak or no memory encoding. But as we'll see, this is only part of the story.

Implications for Learning

This framework has profound implications for how we think about learning and memory. If memory is surprisal analysis, then effective learning isn't about repetition alone. It's about creating experiences that violate expectations, that generate prediction errors, that surprise the learner.

This connects back to my earlier work on learned helplessness and reward prediction error. When students repeatedly experience outcomes that match their low expectations, there's no prediction error. No surprisal. No dopamine signal. No strong memory encoding. The learning doesn't stick because, from the brain's perspective, there's nothing new to learn.

But when a student experiences something that violates their expectations, when they succeed despite believing they would fail, that generates a large prediction error. High surprisal. Strong dopamine signal. Strong memory encoding. The experience gets remembered, and the memory updates their model of their own capabilities.

Surprisal Without Interest

But what if the subject matter is surprising, but the individual has no interest in it? This raises an important question about the relationship between surprisal and memory encoding. If surprisal alone drives memory, then any surprising event should be remembered strongly, regardless of relevance or interest.

Yet we know this isn't how memory works. You might encounter something surprising but irrelevant to your goals or interests, and it doesn't stick. A surprising fact about a topic you don't care about might generate a prediction error, but it doesn't necessarily lead to strong memory encoding.

This suggests that surprisal is necessary but not sufficient for memory encoding. There's likely a second factor: relevance or value. The brain needs to compute not just "how surprising is this?" but also "how relevant is this to my goals and interests?"

From a computational perspective, this makes sense. Memory is expensive. Encoding everything surprising would be inefficient. The brain needs to prioritize. Surprisal identifies what's new and informative, but relevance determines whether that information is worth the cost of encoding.

This might explain why dopamine release isn't just about prediction error. It's also modulated by value signals. Something can be surprising but low value, leading to a weak dopamine signal and weak memory encoding. Something can be surprising and high value, leading to a strong dopamine signal and strong memory encoding.

Interest might function as a relevance filter. When you're interested in a topic, your brain assigns higher value to information related to that topic. Surprising information in an area of interest generates both high surprisal and high value, leading to strong memory encoding. Surprising information in an area of no interest generates high surprisal but low value, leading to weak memory encoding.

This suggests a more complete model where memory encoding strength $M$ depends on both prediction error $\delta$ and value $V$:

$$M = f(\delta, V) = f((r - \hat{r}), V)$$

High surprisal combined with high value leads to strong memory encoding. High surprisal with low value leads to weak encoding. Low surprisal, regardless of value, leads to weak encoding. The brain needs both: information that's new and information that matters.

Designing for Surprisal

If memory is surprisal analysis modulated by relevance, then we can design learning experiences that maximize both surprisal and relevance at the right moments. Not random surprise, but targeted prediction errors in contexts that matter to the learner.

Create moments where students discover they know more than they thought
Present information in ways that violate initial assumptions
Use counterexamples that challenge existing mental models
Provide feedback that generates prediction errors, not just confirms expectations

The goal isn't to surprise for surprise's sake. It's to create experiences that generate meaningful prediction errors, that update the learner's model of the world in ways that improve their understanding and capabilities.

Questions for Research

This framework raises several research questions that I find compelling:

Can we measure both surprisal and relevance in learning environments and predict memory encoding strength?
How do we design educational interventions that optimize both prediction errors and relevance for learning?
What's the relationship between baseline expectations and the surprisal needed to generate strong memories?
Can we model memory formation as a surprisal and value-based process and test interventions computationally?
How does individual variation in dopamine systems and value assignment affect what different people remember?
How do we increase the relevance or value that learners assign to material they initially find uninteresting?

These questions bridge neuroscience, information theory, and machine learning. They offer a way to understand memory not as a mysterious biological process, but as a computational mechanism optimized for information gain.

Conclusion

The idea that memory is surprisal analysis modulated by relevance, with dopamine as the measurement mechanism, reframes how I think about learning and cognition. It suggests that memory isn't arbitrary. It's a computational process designed to encode information that updates your model of the world.

When something is surprising, it means your model was wrong or incomplete. That's valuable information. But your brain also evaluates whether that information matters. Your brain recognizes both through prediction error and value signals, combines them through dopamine, and encodes the result into memory. The more surprising and the more relevant, the stronger the signal, the stronger the memory.

This isn't just a theoretical curiosity. It's a framework that could inform how we design learning systems, how we understand memory disorders, and how we optimize our own learning. If memory is surprisal analysis modulated by relevance, then understanding both surprisal and relevance is understanding memory itself.