'PPO' 태그의 글 목록

Proximal Policy Optimization with Generalized Advantage Estimation (PPO2)

CS & ML Basic 2023. 2. 14. 17:04

PPO2 (Proximal Policy Optimization with Generalized Advantage Estimation) is an extension of PPO that combines the PPO algorithm with Generalized Advantage Estimation (GAE), which is a method for estimating the advantage function. The main difference between PPO and PPO2 is the way they estimate the advantage function. In PPO, the advantage function is estimated using a single-step estimate, whi..

Comparison of TRPO and PPO in Reinforcement Learning

CS & ML Basic 2023. 2. 14. 16:33

TRPO In TRPO, the policy update is performed by solving the following constrained optimization problem: maximize L(θ, θ_old) subject to KL(π_θ_old || π_θ)

Proximal Policy Optimization (PPO) Algorithm

CS & ML Basic 2023. 2. 14. 16:02

The objective of the Proximal Policy Optimization (PPO) algorithm is to train a policy function that can control an agent's behavior in a given environment, such that it maximizes the expected cumulative reward over time. More formally, we can define the objective of PPO as follows: $J (θ) = E π θ [\sum_{t = 0}^{\infty} γ^{t} r_{t}]$ where $J (θ)$ is the ..

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

AI 지식창고 AI 지식창고

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역