Benchmarkıng Hındsıght Experıence Replay Reınforcement Learnıng Methods On Vehıcle Parkıng Envıronment
xmlui.mirage2.itemSummaryView.MetaDataShow full item record
In the age we live in, both passenger transportation and freight transportation are of great importance. Parking the vehicles when they reach the target point is challenging for both humans and automatic parking systems. Artificial intelligence-based methods are used for this task where traditional control methods are insufficient. A common strategy for solving this kind of problem is planning a trajectory using heuristic search algorithms and following that trajectory using traditional control methods. On the other hand, reinforcement learning algorithms are developing algorithms that can be used in solving this kind of problem. HER (Hindsight Experience Replay) method is a wrapper algorithm that increases unsuccessful attempts when used with reinforcement learning algorithms. In this thesis, Twin Delayed Policy Gradient (TD3), Deep Deterministic Policy Gradient (DDPG), Soft Actor-Critic (SAC) reinforcement learning algorithms are studied. The comparison of these algorithms, which have been compared with their raw form on different problems in the literature, with the HER algorithm in the autonomous parking problem has contributed to the literature. In the designed working environment, an artificial intelligence control system was designed with HER supported reinforcement learning methods on a vehicle model whose throttle and steering commands are constantly controlled in space. The designed control system controls the vehicle and enables it to park at the target point. It has been shown by the studies that the studied reinforcement learning methods can solve the autonomous parking problem, and the algorithm performances are compared. Experiments have shown that the TD3 algorithm, which was launched as an improved version of the DDPG algorithm, could not perform better than the DDPG algorithm when used in the autonomous parking problem with HER. The most successful of the algorithms used in this study was the SAC algorithm.