Traffic Rule Compliant AVs using end-to-end RL

TL;DR: Traffic-rule compliance is a non-markovian specification. For example, in a stop-and-go scenario, the AV can only pass the stop line after it has already stopped. This means that specifying the correct behavior requires storing the AV’s history (hence the non-markovianity). This yields RL algorithms that rely on markovian rewards obselete. In this work, we show that traffic rule compliance can be encoded into an end-to-end RL controller for self driving cars. By carefully designing reward machines that encode these rules, we obtain controllers that obey the traffic laws in 100% of the tested scenarios, and achieve a 0% collision rate.

The reward machine for the stop-and-go task is presented here. It combines stop-and-go compliance while mainting a safe following distance.

Sample Reward Machine for Stop-and-Go Task

If you are eager to know what the labels of the above figure stand for, please wait until our manuscript is out!

Here’s a sample of the resulting behaviors in a highway-env simulator. The training curves are shown next, comparing our method to some baselines.

Training Curves for Stop-and-Go Task

Figure: Training curves showing the performance of our method compared to baselines.

We are also working on an unsignalized intersection scenario generated using Scenic. By encoding a first-come-first-serve logic using scenic, we can generate training scenarios where background vehicles also comply with the priorty rules at the intersection. Below is a video of our current setup.