1.
Mukherjee S, Parashar R, Joshi A. A Comparative Study of Proximal Policy Optimization (PPO) and Direct Policy Optimization (DPO) on a Toy Environment. SIGAIR [Internet]. 2025 Jul. 14 [cited 2025 Jul. 27];1(1). Available from: https://sigair.org/index.php/journal/article/view/15