8 min read ADAS Validation

Search-Based Testing (SBT) and the Curse of Dimensionality

Series Context: Part 1 explained why mileage-based validation fails at Level 3/4, requiring billions of test kilometers. This article introduces Search-Based Testing (SBT)—a method that hunts failures through intelligent scenario exploration instead of brute-force exposure accumulation.

Validating autonomous vehicles is no longer a mileage game. The relevant question isn't "How many kilometers did we drive?" but rather "How systematically did we explore the scenarios where the AV is most likely to fail?"

The Curse of Dimensionality: 4-Car Intersection Example

Let’s look at a standard unsignalized intersection with four vehicles (ego + three traffic participants). We fix the ego vehicle's acceleration (allowing the planner to control it) but vary its initial state. This gives us:

  • Ego Vehicle: Initial velocity (v0) and Initial position (p0) [2 parameters]
  • 3 Traffic Participants: Initial velocity (v0), Initial position (p0), and Acceleration (a0) each [3 × 3 = 9 parameters]

That gives us an 11-dimensional space to cover. With a coarse discretization of 10 steps per parameter, brute-force simulation would require:

1011 = 100,000,000,000 concrete scenarios

And this toy setup is aggressively simplified. We have not modeled occlusion, steering, vehicle dynamics and many more.

Yet even this minimal scenario already produces a continuous parameter space large enough that naive brute sweeps are completely impossible to compute.

Assuming an optimistic simulation engine running 100 simulations per second, the runtime would be:

100,000,000,000 scenarios / 100 sims/sec = 1,000,000,000 sec
≈ 31.7 years!

What SBT Does Differently

This is where most teams get stuck: they understand the problem but don't know how to escape brute-force thinking.

Running every single variation is a massive waste of resources on uninteresting scenarios where the AV performs well. The critical events—collisions, near-misses, and edge cases—occupy only small subregions of the parameter space. Search-Based Testing flips the script: instead of brute-force enumeration, treat scenario exploration as an optimization problem.

Instead of evaluating everything, evaluate only what is likely to matter.

To make that work, SBT needs two ingredients:

  1. A Key Performance Indicator (KPI): what we define as "interesting"
  2. A Search Strategy: how we navigate the space

Example KPI for our intersection: minimum bounding-box distance between vehicles during the crossing. The optimization then tries to minimize that distance—surfacing near-misses and collisions.

How Search-Based Testing Works

SBT uses genetic algorithms, Bayesian optimization, or surrogate models to intelligently explore the scenario space. The process works like this:

  1. Coarse Sampling: Sample initial points across the logical scenario space.
  2. KPI Evaluation: Run simulations and compute the KPI for each scenario.
  3. Surrogate Model Training: Build a fast approximation (e.g., Gaussian Process, neural network) of the KPI function based on evaluated samples. This model predicts KPI values without running expensive simulations.
  4. Region Refinement: Use the surrogate model to identify promising regions and focus subsequent samples where the KPI indicates potential critical events.
  5. Continuous Improvement: As new samples are evaluated, the surrogate model is continuously retrained and refined, improving prediction accuracy in critical regions while maintaining computational efficiency.
  6. Repeat: Iterate between surrogate updates and targeted sampling until convergence or computational budget is exhausted.

Surrogate models are the secret weapon here. Instead of running expensive simulations for every candidate scenario, the search algorithm builds a fast approximation—think of it as a "cheap preview." Query the surrogate to eliminate obviously boring regions, then only run full simulations on the most promising candidates. This can cut evaluations by orders of magnitude.

Visual Comparison: SBT vs Full Grid Sampling:

Brute Force: 240 Evaluations Critical Event Region SBT: ~15 Targeted Evaluations

Try It Yourself: Interactive Demonstration

In the following simplified intersection with two vehicles, you can experiment with initial velocities using grid resolutions from 20–80 steps. The simulation uses realistic bounding box collision detection. Compare two strategies side-by-side:

  • Full grid sweep (exhaustive brute-force sampling)
  • SBT refinement (adaptive sampling guided by the KPI)

science Interactive Lab: Brute Force vs SBT

-
Simulation Replay
Simulations: 0
Critical Scenarios: 0
Ratio: 0.00
Safe
Near Miss
Crash
Hover grid to inspect

Implementation Note: This demo uses basic adaptive binary refinement in a 2D space. Production implementations leverage Bayesian optimization, surrogate models, and genetic algorithms to achieve orders of magnitude better performance in high-dimensional spaces. The key insight: efficiency gains grow exponentially with dimensionality.

So how do these production systems actually work? The secret lies in surrogate models—fast approximations that guide the search without burning through simulation budget.

The KPI: The Compass That Guides the Search

SBT is only as good as the KPI driving it. Choose a bad KPI, and your search will happily optimize toward irrelevant scenarios while missing actual failures.

For example, a binary crash flag provides no gradient for the search algorithm to follow—it's blindly guessing near safety boundaries. Better KPIs are continuous (like minimum distance over the trajectory), resistant to exploitation (avoiding scenarios that game the metric), and capture temporal dynamics rather than static snapshots. A common pitfall: We've seen teams spend weeks tuning search parameters, only to discover their KPI was rewarding scenarios where vehicles never enter the intersection at all.

KPI design is critical not just for SBT, but for the entire V&V pipeline—from pass/fail criteria to acceptance testing to regulatory evidence. Getting them right is one of the hardest parts of autonomous vehicle validation. We'll explore KPI design principles and failure modes in a dedicated article.

Limitations and Trade-offs

SBT isn't a silver bullet. It's a powerful tool, but one with real limitations you need to understand before betting your safety case on it.

The trade-offs:

  • May converge to local optima → multiple runs needed
  • Search coverage is KPI-dependent
  • Surrogate quality limits sensitivity
  • False negatives are possible
  • No universal stopping criterion

The hardest part? Knowing when you've sampled enough.

"It's part science, part judgment, part organizational risk tolerance."

This is why SBT must sit inside a larger validation loop rather than acting as a standalone technique.

Implementation note: SBT isn't a replacement for existing simulation platforms—it's an orchestration layer that sits on top of your existing validation infrastructure, intelligently selecting which scenarios to test.

Position in the Validation Pipeline

Before we get too excited about SBT's efficiency gains, let's be clear about what it does and doesn't solve.

SBT solves only one piece of the puzzle: efficient sampling within a single logical scenario.

It does not solve:

  • Scenario generation
  • Scenario prioritization across the ODD
  • ODD coverage reasoning
  • Real-drive data integration
  • Regulatory safety argumentation

Those are the topics of the next articles in this series.

References

  1. Scenario abstraction levels (functional → logical → concrete). Link
  2. PEGASUS project: Scenario-based testing methodology. Link
Kaveh Rahnema

Kaveh Rahnema

V&V Expert for ADAS & Autonomous Driving with 7+ years at Robert Bosch GmbH.

Connect on LinkedIn