Proximal Policy Optimization
-
Proximal Policy Optimization (PPO) AlgorithmCS & ML Basic 2023. 2. 14. 16:02
The objective of the Proximal Policy Optimization (PPO) algorithm is to train a policy function that can control an agent's behavior in a given environment, such that it maximizes the expected cumulative reward over time. More formally, we can define the objective of PPO as follows: $$J(\theta) = \mathbb{E}{\pi{\theta}}\left[\sum_{t=0}^{\infty}\gamma^{t}r_{t}\right]$$ where \(J(\theta)\) is the ..