InstructGPT
-
Advantage Function in RLCS & ML Basic 2023. 2. 14. 16:02
The advantage function in reinforcement learning is a measure of how much better an action is compared to other actions in a given state. It is a critical component in many reinforcement learning algorithms, including PPO. Mathematically, the advantage function is defined as the difference between the expected reward of taking a specific action in a given state and the expected reward of followi..
-
Proximal Policy Optimization (PPO) AlgorithmCS & ML Basic 2023. 2. 14. 16:02
The objective of the Proximal Policy Optimization (PPO) algorithm is to train a policy function that can control an agent's behavior in a given environment, such that it maximizes the expected cumulative reward over time. More formally, we can define the objective of PPO as follows:
where is the ..