Generation of Naturalistic Traffic Rule Violations Using Imitation Learning • Murat Can Üste

Overview

Testing autonomous vehicles requires billions of kilometers of real-world driving to meet basic safety criteria. Virtual scenario-based testing offers a more affordable alternative, but real-world datasets lack sufficient diversity — particularly for safety-critical situations like traffic rule violations. Motion planning algorithms assume other drivers follow traffic rules; when this assumption breaks, predictions fail and autonomous vehicles are put at risk. This thesis addresses the problem of generating realistic test scenarios where traffic participants violate traffic rules in a naturalistic, human-like manner.

Problem

Existing real-world traffic datasets contain relatively few traffic rule violations. Analysis of 2,500 trajectories from the HighD highway dataset shows that while only ~37% of vehicles obeyed all evaluated regulations, individual rule violation rates remain low — never exceeding 38%, even for easily violated rules like safe following distance or speed limits. Previous work on virtual scenario generation focused on safety-critical situations (e.g., near-collisions) rather than the traffic rule violations that cause them, partly due to a lack of machine-interpretable formalizations of traffic rules.

Approach

The approach has four phases: expert data generation, dataset creation, model training, and evaluation.

Traffic Rule Formalization with Signal Temporal Logic

Traffic rules are formalized using Signal Temporal Logic (STL) with quantitative semantics, which replaces binary satisfaction (violated/not violated) with a continuous robustness degree in [-1, 1]. This quantifies how far a trajectory is from violating or satisfying a rule. Three highway traffic rules are considered:

R_G1 — Maintaining a safe distance to the preceding vehicle
R_G2 — Avoiding unnecessary braking
R_G3 — Adhering to the legal speed limit

Expert Data Generation

Vehicle trajectories from the HighD dataset (naturalistic highway traffic captured by UAVs at 25 fps across 60 recordings, 110,500+ vehicles) are converted to the CommonRoad scenario format using a Point-Mass vehicle model. A pre-filtering pipeline removes free-driving scenarios using distance headway thresholds, yielding 47,733 car trajectories. After passing these through the traffic rule monitoring tool and removing edge cases, 33,079 scenarios remain with time horizons of 7–24 seconds.

Imitation Learning Models

Three models are trained and compared:

Behavior Cloning (BC) — Supervised learning baseline that directly maps observations to actions from expert demonstrations. Simple but suffers from cascading errors on longer time horizons.
GAIL — Generative Adversarial Imitation Learning uses a discriminator to distinguish between expert and generated trajectories, with a PPO-based generator policy that learns to fool the discriminator. Does not require an explicit reward function.
RAGAIL — Reward-Augmented GAIL, a modified version that incorporates prior knowledge (collision and off-road penalties) as a surrogate reward signal alongside the discriminator loss.

All models use a 4-layer MLP (256 nodes each) as an actor-critic policy, trained on the CommonRoad-RL environment with configurable observation/action spaces.

Dataset Configurations

Six datasets are created from the training split to evaluate different aspects:

Full Dataset — All scenarios, testing overall behavior reproduction
Rule Violating Datasets (G1, G2, G3) — Only scenarios violating the specific rule, testing whether individual violations can be replicated
Rule Following Datasets (G1, G2, G3 Following) — Only scenarios complying with the specific rule, testing whether the model avoids generating violations when not demonstrated

Results

Emergent Behavior

GAIL and RAGAIL achieve significantly higher goal-reach rates than BC across all datasets. BC frequently drives off-road, especially in longer scenarios (>16s), due to cascading errors. All models show similar collision rates, with RAGAIL achieving slightly fewer collisions than GAIL thanks to reward augmentation.

Naturalistic Behavior

Generated trajectories maintain similar distributions of velocity, acceleration, and jerk values compared to expert trajectories (measured via Jensen-Shannon distance and RMSE). Longitudinal dynamics are well-reproduced; lateral dynamics show slightly more variation due to the narrow range of lateral motion in highway driving.

Traffic Rule Violation Generation

Full Dataset: GAIL generalizes rule-violating behavior to scenarios where the expert doesn’t violate rules — it produces R_G1 violations in 44.5% of scenarios (vs. 14.7% in expert data) while maintaining naturalistic driving distributions. BC follows expert distributions more closely but cannot generalize.
Rule Violating Datasets: GAIL/RAGAIL increase violation rates for targeted rules while preserving realistic behavior. For R_G3 (speeding), RAGAIL nearly doubles the violation rate compared to the full dataset. BC shows minimal change from its full-dataset baseline.
Rule Following Datasets: GAIL/RAGAIL show reduced violations for the filtered rule while still violating other rules — confirming that violation behavior can be selectively controlled. BC performs worse overall, with increased off-road and collision rates.

Key Findings

GAIL/RAGAIL generalize underrepresented behaviors (traffic rule violations) to new scenarios while maintaining naturalistic driving — solving the core problem of insufficient violation diversity in real-world datasets.
BC works adequately for short horizons (~8–12s) when the state space is sufficiently covered, but fails to generalize beyond observed states.
Rule violation behavior can be selectively amplified or suppressed by curating training datasets, enabling focused scenario generation.
Reward augmentation (RAGAIL) reduces collisions at the cost of slightly increased timeouts, offering a trade-off for safety-constrained generation.

Limitations & Future Work

Traffic rule coverage: Only three highway rules are formalized in STL; extending to urban scenarios requires more codifications.
Single-agent: The approach does not model how other traffic participants react to rule violations. Multi-agent extensions like PS-GAIL could enable more realistic interactions.
Controlled generation: Extensions like InfoGAIL could detect latent factors of variation in demonstrations, enabling controlled combination of different violation behaviors.

Tools & Technologies

PyTorch — Deep learning framework for policy networks
Stable Baselines 3 + Imitation Library — RL and imitation learning implementations (PPO, BC, GAIL)
CommonRoad + CommonRoad-RL — Scenario format, simulation environment, and dataset conversion
Signal Temporal Logic (STL) — Formal traffic rule specification with quantitative semantics
HighD Dataset — Naturalistic highway traffic recordings (UAV-captured, 25 fps, 110K+ vehicles)