- Yingkui Lin

Surprize

Skinner’s Box Setup

A rat presses a lever.

Sometimes it gets food (reward).

If the rat gets food when it did not expect it, that’s reward exceeds expectation.

In reinforcement learning, this is the positive reward prediction error (RPE).

Why It Feels Good (Reward Signal)

The unexpected food provides new evidence that the world is better (more rewarding) than predicted.

Dopamine neurons fire to signal this positive surprise.

In FEP terms, this means the brain updates its model to reduce free energy:

It increases the probability that “lever → food” is true.

Future actions exploit this updated model to harvest reward.

So, “reward exceeds expectation” = a spike in prediction error that drives belief updating.

Normal case (non-addicted learning)

Baseline FEP logic: the rat builds a generative model: lever → maybe food.

Each lever press updates the model according to whether food appears.

Over time, the rat forms an accurate model of the contingencies, minimizing free energy (no more surprise).

In addiction, prediction error is artificially sustained → model keeps updating toward “chasing” reward, leading to compulsive behavior.

Addiction = failure to minimize free energy over time.

The brain’s generative model is hijacked by a strong prior: “lever press → reward, maybe even better than last time.”

This prior outweighs the actual sensory evidence (sometimes no food, or food that isn’t new).

The system treats prediction error as if it is always positive, so it never settles.

This tricks the generative model into thinking the reward exceeded expectations every single time.

From FEP view: the system keeps encountering huge positive prediction errors → the model never converges.

The brain keeps re-learning “drug = super valuable,” even though the evidence (damage to health, tolerance) should eventually correct it.

Slot machines are built on variable reinforcement schedules, like Skinner’s box.

The unpredictability sustains prediction errors: sometimes no win, sometimes a jackpot.

From FEP’s view: the brain never settles on a stable expectation because uncertainty is kept high.

Dopamine surges at near misses and unexpected wins → the model keeps updating toward “try again, maybe next time.”

Why it feels compelling

In FEP, value = reducing expected surprise in the future.

Addiction distorts this calculation:

Short-term dopamine surge = brain predicts “this reduces surprise massively.”

Long-term reality = it actually increases uncertainty (health, finances, relationships).

The system keeps acting on the short-term prediction error, failing to update properly.

Summary (human addiction in FEP terms)

Addiction = sustained positive prediction error + maladaptive precision weighting.

Drugs/gambling flood or distort dopamine so that the brain never “learns out” of the loop.

The generative model is hijacked: it keeps treating addictive behavior as the best way to minimize expected free energy, even though in reality it deepens surprise long term.

So in FEP, addiction is not “pleasure-seeking” but a model failure: the brain is stuck in an illusory attractor where prediction errors never decay.