Parallel RRT for Autonomous Robots

Accelerated path planning using OpenMP and CUDA

As part of ECE 759: High Performance Computing (Spring 2025) at UW–Madison, I contributed to a project that significantly accelerated path planning algorithms using parallel computing on both CPU and GPU. We focused on Standard RRT and Bidirectional RRT, achieving up to 151× speedup using CUDA on NVIDIA GPUs.

GitHub Repository: https://github.com/xuann6/ece759_final_proj.git


Contributions

  • Parallelized nearest node search, collision detection, and tree expansion using OpenMP (C++) and CUDA.
  • Developed highly optimized CUDA kernels to improve GPU occupancy, reduce memory latency, and minimize kernel launch overhead.
  • Implemented warp-level sampling, struct-of-arrays layout, and branchless geometry predicates for efficient path validation.
  • Achieved a peak 151× speedup by fusing kernels, increasing occupancy, and reducing atomic contention in Bidirectional RRT.

RRT expansion visualized on a 2D environment using the CUDA-accelerated implementation.

Key Technologies

  • CUDA, OpenMP, C++, Python, cuRAND, Thrust
  • Nsight profiling, persistent-thread GPU kernels
  • Collision checking using shared memory and branchless logic

This project strengthened my skills in parallel programming, GPU kernel design, and motion planning for autonomous systems, with direct applicability to robotics and embedded AI applications.