-
TRPO In TRPO, the policy update is performed by solving the following constrained optimization problem: maximize L(θ, θ_old) subject to KL(π_θ_old || π_θ)