Reinforcement Learning – A Reference
This text draws primarily from course materials for PA230 Reinforcement Learning, taught by Petr Novotný. Any errors or inaccuracies are my own.A variation of the agent-environment figure (such as this one) made with DALLE 3. Problem: How to compute the optimal value vector and find an optimal policy in an MDP.Solution: Use linear programming.Problem: Linear programming is computationally slow.Solution: Repeatedly apply Bellman updates until convergence -> Value Iteration \(v(s) \leftarrow \max_...
Read more at jakubhalmes.substack.com