COMPARISON OF RL ALGORITHM ADAPTIVITY FOR UAV COLLISION AVOIDANCE
Keywords:
DQN, PPO, SAC, TD3, DDPG, A2C, drone navigation, collision avoidance, reinforcement learningAbstract
This paper investigates the adaptivity of reinforcement learning (RL) algorithms, specifically DQN, PPO, SAC, TD3, DDPG, and A2C, in a simulated three-dimensional environment that mirrors unmanned-aerial-vehicle (UAV) collision-avoidance scenarios. The aim of the study is to compare each algorithm’s ability to generalize acquired knowledge and apply it in new, unpredictable flight conditions.The experimental section consists of two stages. In the first stage, drones are trained in conditions where their trajectories are directed toward one another, with fixed initial speed and position parameters. In the second stage, each algorithm is validated in scenarios that lie outside the training domain, featuring random changes in flight direction and velocity characteristics. This approach makes it possible to assess the stability of model behavior in newly encountered, unforeseen conditions. The study compares the rates of successful collision avoidance, reaction speed, and maneuver quality for each algorithm. The data show that some algorithms, such as SAC and TD3, exhibit higher stability under strong trajectory fluctuations, whereas DQN and PPO may be less robust during unpredictable test phases. A2C and DDPG proved intermediate in both success rate and decision-making time.The proposed research provides valuable information for the practical implementation of autonomous UAV control systems in dynamic and uncertain environments. The findings can be used to develop adaptive controllers capable of rapidly correcting drone behavior when unforeseen situations arise. Future studies should incorporate sensor systems that closely replicate real flight conditions and assess how they affect algorithm adaptivity. This will further enhance the reliability and safety of unmanned systems in real flight missions.
References
Mohsan S. A. H., Othman N. Q. H., Li Y., Alsharif M. H., Khan M. A. Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends. Intelligent Service Robotics. 2023. Vol. 16, No. 1. P. 109–137. DOI: 10.1007/s11370-022-00452-4
Arulkumaran K., Deisenroth M. P., Brundage M., Bharath A. A. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. 2017. Vol. 34, No. 6. P. 26–38. DOI: 10.1109/MSP.2017.2743240
Mnih V., Kavukcuoglu K., Silver D. et al. Playing Atari with deep reinforcement learning. arXiv Preprint. 2013. No. 1312.5602. URL: http://arxiv.org/abs/1312.5602 (дата звернення: 07.05.2025).
Schulman J., Wolski F., Dhariwal P. et al. Proximal policy optimization algorithms. arXiv Preprint. 2017. No. 1707.06347. URL: http://arxiv.org/abs/1707.06347 (дата звернення: 07.05.2025).
Hwang H. J., Jang J., Choi J. et al. Stepwise Soft Actor–Critic for UAV autonomous flight control. Drones. 2023. Vol. 7, No. 9. Article 549. DOI: 10.3390/drones7090549
Abo Mosali N., Shamsudin S. S., Alfandi O. et al. Twin delayed deep deterministic policy gradient-based target tracking for unmanned aerial vehicle with achievement rewarding and multistage training. IEEE Access. 2022. Vol. 10. P. 23545–23559. DOI: 10.1109/ACCESS.2022.3154388
Sun D., Gao D., Zheng J., Han P. Unmanned aerial vehicles control study using deep deterministic policy gradient. 2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC). Xiamen, China. 2018. P. 1–5. DOI: 10.1109/GNCC42960.2018.9018682
Ayeelyan J., Lee G.-H., Hsu H.-C., Hsiung P.-A. Advantage Actor-Critic for autonomous intersection management. Vehicles. 2022. Vol. 4, No. 4. P. 1391–1412. DOI: 10.3390/vehicles4040073
Reuf K., Stefan W., Simon H. Deep Q-learning versus proximal policy optimization: Performance comparison in a material sorting task. Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE 2023). 2023. P. 1–6. DOI: 10.1109/ISIE51358.2023.10228056
Kalidas A. P., Joshua C. J., Md A. Q. et al. Deep reinforcement learning for vision-based navigation of UAVs in avoiding stationary and mobile obstacles. Drones. 2023. Vol. 7, No. 4. Article 245. DOI: 10.3390/drones7040245
Rybchak Z., Kopylets M. Comparative analysis of DQN and PPO algorithms in UAV obstacle avoidance 2D simulation. Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems. Volume III: Intelligent Systems Workshop. 2024. P. 391–403. URL: https://ceur-ws.org/Vol-3688/paper25.pdf (дата звернення: 07.05.2025).
PyBullet physics engine. URL: https://pybullet.org/ (дата звернення: 07.05.2025).
Stable-Baselines3 documentation. URL: https://stable-baselines3.readthedocs.io/ (дата звернення: 07.05.2025).
Raffin A., Hill A., Gleave A. et al. Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research. 2021. Vol. 22. Article 268. URL: http://jmlr.org/papers/v22/20-1364.html (дата звернення: 07.05.2025).