MSc Thesis Defense: Barış Almaç, DEEP REINFORCEMENT LEARNING BASED HYBRID GOAL ASSIGNED MULTI AGENT PATH PLANNING IN DYNAMIC ENVIRONMENTS

DEEP REINFORCEMENT LEARNING BASED HYBRID GOAL ASSIGNED MULTI AGENT PATH PLANNING IN DYNAMIC ENVIRONMENTS

Barış Almaç
Mechatronics Engineering, MSc. Thesis, 2025

Thesis Jury

Prof. Mustafa Ünel (Thesis Advisor)

Assoc. Prof. Kemalettin Erbatur

Assoc. Prof. Ali Fuat Ergenç

Date & Time: December 18th, 2025 – 1:00 PM

Place: FASS G056

Keywords : Deep Reinforcement Learning, Predictive Shield, Hybrid Goal Assignment, Proximal Policy Optimization, Path Finding

Abstract

Coordinating large fleets of robots in warehouses, factories, and traffic systems requires moving many agents through the same space quickly and safely, yet traditional planners struggle to keep up as environments become dynamic and crowded. This challenge is formalized as the Multi-Agent Path Finding (MAPF) problem, where multiple agents must navigate a shared environment, reach their assigned goals, and avoid collisions while minimizing mission time and travel cost under partial observability. Classical optimal MAPF solvers offer strong guarantees when full global information is available, but their runtime grows rapidly with agent count and frequent replanning, whereas purely learned reactive policies scale better but provide little safety assurance and often fail in dense, time-varying scenes. This thesis introduces a decentralized reinforcement learning framework that aims to bridge this gap by combining a shared PPO-based policy with three structured components: a two-stage goal assignment prior (Euclidean auction with A*-cost refinement), a one-step predictive shield that estimates per-action collision risk, and a safety-gated teacher–student loss that selectively imitates A* only when predicted risk is low. In dynamic grid environments with moving obstacles, the proposed method achieves near-perfect success and goal coverage, reduces collisions and makespan by roughly half compared to a CBS-based planner and strong DRL baselines, and yields substantially higher average rewards. Overall, the work shows that structured and shielded RL can provide a promising route toward safe, scalable, and reactive decentralized MAPF in realistic scenarios.