Back to Course
Reinforcement Learning: Agents
Module 11 of 11
11. RL Cheatsheet
Q-Learning
$$ Q(s,a) \leftarrow Q(s,a) + \alpha [r + \gamma \max Q(s',a') - Q(s,a)] $$
Gymnasium
pythonenv = gym.make("CartPole-v1") obs, info = env.reset() action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action)
PPO Clip
pythonratio = prob / old_prob surr1 = ratio * adv surr2 = torch.clamp(ratio, 1-eps, 1+eps) * adv loss = -torch.min(surr1, surr2).mean()