PPO
-
Proximal Policy Optimization with Generalized Advantage Estimation (PPO2)CS & ML Basic 2023. 2. 14. 17:04
PPO2 (Proximal Policy Optimization with Generalized Advantage Estimation) is an extension of PPO that combines the PPO algorithm with Generalized Advantage Estimation (GAE), which is a method for estimating the advantage function. The main difference between PPO and PPO2 is the way they estimate the advantage function. In PPO, the advantage function is estimated using a single-step estimate, whi..
-
Proximal Policy Optimization (PPO) AlgorithmCS & ML Basic 2023. 2. 14. 16:02
The objective of the Proximal Policy Optimization (PPO) algorithm is to train a policy function that can control an agent's behavior in a given environment, such that it maximizes the expected cumulative reward over time. More formally, we can define the objective of PPO as follows:
where is the ..