Search-Based Testing (SBT) and the Curse of Dimensionality

The Curse of Dimensionality: 4-Car Intersection Example

Let’s look at a standard unsignalized intersection with four vehicles (ego + three traffic participants). We fix the ego vehicle's acceleration (allowing the planner to control it) but vary its initial state. This gives us:

Ego Vehicle: Initial velocity (v₀) and Initial position (p₀) [2 parameters]
3 Traffic Participants: Initial velocity (v₀), Initial position (p₀), and Acceleration (a₀) each [3 × 3 = 9 parameters]

That gives us an 11-dimensional space to cover. With a coarse discretization of 10 steps per parameter, brute-force simulation would require:

10¹¹ = 100,000,000,000 concrete scenarios

And this toy setup is aggressively simplified. We have not modeled occlusion, steering, vehicle dynamics and many more.

Yet even this minimal scenario already produces a continuous parameter space large enough that naive brute sweeps are completely impossible to compute.

Assuming an optimistic simulation engine running 100 simulations per second, the runtime would be:

100,000,000,000 scenarios / 100 sims/sec = 1,000,000,000 sec
≈ 31.7 years!

What SBT Does Differently

This is where most teams get stuck: they understand the problem but don't know how to escape brute-force thinking.

Running every single variation is a massive waste of resources on uninteresting scenarios where the AV performs well. The critical events—collisions, near-misses, and edge cases—occupy only small subregions of the parameter space. Search-Based Testing flips the script: instead of brute-force enumeration, treat scenario exploration as an optimization problem.

Instead of evaluating everything, evaluate only what is likely to matter.

To make that work, SBT needs two ingredients:

A Key Performance Indicator (KPI): what we define as "interesting"
A Search Strategy: how we navigate the space

Example KPI for our intersection: minimum bounding-box distance between vehicles during the crossing. The optimization then tries to minimize that distance—surfacing near-misses and collisions.

How Search-Based Testing Works

SBT uses genetic algorithms, Bayesian optimization, or surrogate models to intelligently explore the scenario space. The process works like this:

Coarse Sampling: Sample initial points across the logical scenario space.
KPI Evaluation: Run simulations and compute the KPI for each scenario.
Surrogate Model Training: Build a fast approximation (e.g., Gaussian Process, neural network) of the KPI function based on evaluated samples. This model predicts KPI values without running expensive simulations.
Region Refinement: Use the surrogate model to identify promising regions and focus subsequent samples where the KPI indicates potential critical events.
Continuous Improvement: As new samples are evaluated, the surrogate model is continuously retrained and refined, improving prediction accuracy in critical regions while maintaining computational efficiency.
Repeat: Iterate between surrogate updates and targeted sampling until convergence or computational budget is exhausted.

Surrogate models are the secret weapon here. Instead of running expensive simulations for every candidate scenario, the search algorithm builds a fast approximation—think of it as a "cheap preview." Query the surrogate to eliminate obviously boring regions, then only run full simulations on the most promising candidates. This can cut evaluations by orders of magnitude.

Visual Comparison: SBT vs Full Grid Sampling:

Try It Yourself: Interactive Demonstration

In the following simplified intersection with two vehicles, you can experiment with initial velocities using grid resolutions from 20–80 steps. The simulation uses realistic bounding box collision detection. Compare two strategies side-by-side:

Full grid sweep (exhaustive brute-force sampling)
SBT refinement (adaptive sampling guided by the KPI)

science Interactive Lab: Brute Force vs SBT

Grid Resolution (K)

Sim Speed

Vel Range (m/s)

Accel (m/s²)

Simulation Replay

Simulations: 0

Critical Scenarios: 0

Ratio: 0.00

Safe

Near Miss

Crash

Hover grid to inspect

Implementation Note: This demo uses basic adaptive binary refinement in a 2D space. Production implementations leverage Bayesian optimization, surrogate models, and genetic algorithms to achieve orders of magnitude better performance in high-dimensional spaces. The key insight: efficiency gains grow exponentially with dimensionality.

So how do these production systems actually work? The secret lies in surrogate models—fast approximations that guide the search without burning through simulation budget.

The KPI: The Compass That Guides the Search

SBT is only as good as the KPI driving it. Choose a bad KPI, and your search will happily optimize toward irrelevant scenarios while missing actual failures.

For example, a binary crash flag provides no gradient for the search algorithm to follow—it's blindly guessing near safety boundaries. Better KPIs are continuous (like minimum distance over the trajectory), resistant to exploitation (avoiding scenarios that game the metric), and capture temporal dynamics rather than static snapshots. A common pitfall: We've seen teams spend weeks tuning search parameters, only to discover their KPI was rewarding scenarios where vehicles never enter the intersection at all.

KPI design is critical not just for SBT, but for the entire V&V pipeline—from pass/fail criteria to acceptance testing to regulatory evidence. Getting them right is one of the hardest parts of autonomous vehicle validation. We'll explore KPI design principles and failure modes in a dedicated article.

Limitations and Trade-offs

SBT isn't a silver bullet. It's a powerful tool, but one with real limitations you need to understand before betting your safety case on it.

The trade-offs:

May converge to local optima → multiple runs needed
Search coverage is KPI-dependent
Surrogate quality limits sensitivity
False negatives are possible
No universal stopping criterion

The hardest part? Knowing when you've sampled enough.

"It's part science, part judgment, part organizational risk tolerance."

This is why SBT must sit inside a larger validation loop rather than acting as a standalone technique.

Implementation note: SBT isn't a replacement for existing simulation platforms—it's an orchestration layer that sits on top of your existing validation infrastructure, intelligently selecting which scenarios to test.

Position in the Validation Pipeline

Before we get too excited about SBT's efficiency gains, let's be clear about what it does and doesn't solve.

SBT solves only one piece of the puzzle: efficient sampling within a single logical scenario.

It does not solve:

Scenario generation
Scenario prioritization across the ODD
ODD coverage reasoning
Real-drive data integration
Regulatory safety argumentation

Those are the topics of the next articles in this series.

References

Scenario abstraction levels (functional → logical → concrete). Link
PEGASUS project: Scenario-based testing methodology. Link