Clustering Similar Traffic Scenarios • Murat Can Üste

Overview

Validating automated driving functions through kilometer-based testing requires billions of kilometers — an infeasible approach. The scenario-based approach reduces this by testing representative scenarios from each category, but discovering those categories manually is impractical. This project implements an end-to-end pipeline that automatically clusters traffic scenarios into meaningful categories, reducing the testing and validation effort while ensuring broad coverage.

Architecture

The pipeline consists of three main stages:

Data Generation — Converting trajectory datasets into a common scenario format
Feature Extraction — Building a quantitative model of each scenario
Clustering — Unsupervised grouping via a modified Random Forest and proximity matrix

Data Generation

The highD dataset serves as the data source — naturalistic vehicle trajectories recorded by aerial drones on German highways near Cologne. It contains 60 recordings (~17 min each), covering 110,500+ vehicles and 147 driven hours at 25 fps with high positional accuracy.

Each vehicle’s most critical maneuver is identified using three safety metrics:

DHW (Distance Headway) — Distance between the ego and the preceding vehicle
THW (Time Headway) — Time for the ego to reach the preceding vehicle’s position
TTC (Time to Collision) — Time until collision at current speeds

Recordings are converted into CommonRoad scenario format as 81-timestep segments (3.24 seconds), centered on each vehicle’s most critical moment. A filtering process discards free-driving scenarios that pose no challenge to motion planners.

Six datasets were generated by combining 3 metrics × 2 vehicle selections (lane-changers only vs. all non-free-driving vehicles), yielding “Small” datasets (~6,500–7,000 samples) and “Big” datasets (~44,000–45,000 samples).

Feature Extraction

The area around the ego vehicle is divided into six zones (preceding/rear × same lane/left/right) within a 100m sensor range. A total of 57 features are computed at five key timesteps (start, min THW, min DHW, min TTC, end):

Ego state features — Acceleration, velocity, lane change timing, cut-in events, braking duration
Relative features — Distances to surrounding vehicles, traffic density
Maneuver indicators — Lane change occurrence, brake events

Feature importance analysis via Random Forest split frequency revealed that ego vehicle state features (lane change timing, braking, acceleration) are significantly more discriminative than surrounding vehicle features.

Modified Unsupervised Random Forest

Since the data is unlabeled, a modified Unsupervised Random Forest (URF) transforms the clustering problem into a supervised one by generating synthetic data at each tree node.

Decision tree construction:

At each node, synthetic data points are generated alongside the real data
The split is chosen by the feature/value combination with the best Gini gain
√M features are randomly sampled per split (standard RF bagging)

Path-based proximity matrix:

For each tree, every datapoint’s path from root to leaf is traced
Similarity between two points is computed as the Jaccard index of their paths (shared nodes / total nodes)
The final proximity score is averaged across all trees in the forest

Key improvements over the initial implementation addressed critical shortcomings (low split count, ever-increasing Gini index, uniformly high distances):

Gini gain instead of Gini index — provides relative improvement measurement, increasing average splits from 5 to 15 per tree
Multiple synthetic distributions — Uniform, Normal, and Gaussian mixture distributions are randomly chosen per node, eliminating single-distribution bias
Flexible synthetic point counts — Synthetic points distributed proportionally based on the split position’s CDF, discouraging splits of valid clusters

Clustering & Evaluation

Three clustering methods were applied to the resulting distance matrix:

Hierarchical clustering (primary) — With complete, single, and average linkage; produces interpretable cluster heatmaps
DBSCAN — Density-based approach; struggled with the high distance values in the matrix
Spectral clustering — Similar performance to hierarchical clustering

Evaluation used the silhouette coefficient as the metric. The best result (SC = 0.071) was achieved on the DHW dataset with all features, 2 clusters, and average linkage. The resulting clusters showed meaningful differences — for example, one cluster contained scenarios where the ego vehicle brakes before the minimum TTC moment, while the other did not brake at all.

The pipeline was tested with 200 trees, max depth 10, and various Gini gain thresholds (0.01–0.05) across all small datasets. The most discriminative features were consistently related to ego acceleration, braking time, and velocity at critical timesteps.

Tools & Technologies

Python — Core implementation language
CommonRoad — Traffic scenario framework for standardized scenario representation
Scikit-learn — Hierarchical clustering, DBSCAN, and spectral clustering
Modified URF — Custom unsupervised Random Forest with path-based Jaccard proximity
highD Dataset — Naturalistic highway trajectory data from aerial drone recordings