Advantage Function in RL

CS & ML Basic 2023. 2. 14. 16:02

The advantage function in reinforcement learning is a measure of how much better an action is compared to other actions in a given state. It is a critical component in many reinforcement learning algorithms, including PPO.

Mathematically, the advantage function is defined as the difference between the expected reward of taking a specific action in a given state and the expected reward of following the current policy in that same state:

$A (s, a) = Q (s, a) - V (s)$

where $A (s, a)$ is the advantage of taking action $a$ in state $s$ , $Q (s, a)$ is the expected reward of taking action $a$ in state $s$ , and $V (s)$ is the expected reward of following the current policy in state $s$ .

Intuitively, the advantage function tells the agent how much better it is to take a specific action compared to following its current policy. If the advantage is positive, the agent should take that action, and if the advantage is negative, the agent should avoid taking that action. The advantage function helps the agent to learn which actions are better or worse in each state, allowing it to improve its policy over time.

In the PPO algorithm, the advantage function is estimated using a value function estimator such as a critic neural network. The estimated advantage is then used in the surrogate objective function to calculate the policy gradient, which is used to update the policy parameters.

For example, let's say you're playing a game where you're controlling a robot to collect coins in a 2D environment. In a given state, there are three possible actions: move left, move right, or jump. The advantage function would help you to determine which action is the best to take in that state. For instance, if moving right has a higher expected reward than the other actions, the advantage function will indicate that taking this action is the best choice, and you should choose to move right to maximize your reward.

'CS & ML Basic' 카테고리의 다른 글

Proximal Policy Optimization with Generalized Advantage Estimation (PPO2) (0)	2023.02.14
Comparison of TRPO and PPO in Reinforcement Learning (0)	2023.02.14
Proximal Policy Optimization (PPO) Algorithm (0)	2023.02.14

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

AI 지식창고 AI 지식창고

'CS & ML Basic' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

'CS & ML Basic' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역