TensorLearn
Back to Course
Reinforcement Learning: Agents
Module 11 of 11

11. RL Cheatsheet

Q-Learning

$$ Q(s,a) \leftarrow Q(s,a) + \alpha [r + \gamma \max Q(s',a') - Q(s,a)] $$

Gymnasium

python
env = gym.make("CartPole-v1") obs, info = env.reset() action = env.action_space.sample() obs, reward, terminated, truncated, info = env.step(action)

PPO Clip

python
ratio = prob / old_prob surr1 = ratio * adv surr2 = torch.clamp(ratio, 1-eps, 1+eps) * adv loss = -torch.min(surr1, surr2).mean()

Mark as Completed

TensorLearn - AI Engineering for Professionals