(1)
Mukherjee, S.; Parashar, R.; Joshi, A. A Comparative Study of Proximal Policy Optimization (PPO) and Direct Policy Optimization (DPO) on a Toy Environment. SIGAIR 2025, 1.