The agent learns to take actions that maximize the cumulative reward signal over time.
Imagine an agent, similar to Pac-Man, making decisions based on feedback from its environment to achieve a desired goal.