Mukherjee, Saptarshi, Rohit Parashar, and Aniket Joshi. “A Comparative Study of Proximal Policy Optimization (PPO) and Direct Policy Optimization (DPO) on a Toy Environment”. Special Interest Group on Artificial Intelligence Research 1, no. 1 (July 14, 2025). Accessed July 27, 2025. https://sigair.org/index.php/journal/article/view/15.