Circuit Breakers
A circuit breaker is a pattern where your orchestration tool stops a data pipeline from continuing when Validio detects data quality incidents in freshly loaded data. By placing Validio between an ingestion step and the steps that depend on it, you prevent low-quality data from propagating to downstream tables, dashboards, and models.
Validio supports this pattern today through the Apache Airflow integration and the Validio API, so you can gate any orchestrator that can call an API.
How a Circuit Breaker Works
A circuit breaker wraps a data quality check around the point in your pipeline where new data lands. The pattern has three steps:
- Poll the source. After an upstream job (such as an ETL load) writes new data, trigger a Validio source poll so its validators evaluate the freshly landed data immediately, instead of waiting for the next scheduled poll.
- Check for incidents. Once the poll completes, check whether it produced any incidents for that source.
- Break or continue. If the number or severity of incidents exceeds the threshold you allow, fail the task and stop the downstream steps. Otherwise, let the pipeline continue.
You decide how strict the breaker is — for example, allow a small number of low-severity incidents but break on any high-severity incident.
Circuit Breaking with Apache Airflow
The Validio for Airflow integration provides operators and sensors that implement this pattern natively in a DAG:
- Use the
ValidioPollSourceOperatorto trigger a source poll after your data-loading task completes. - Use the
ValidioIncidentsSensorto gate the DAG. The sensor pokes Validio for recent incidents and can be configured with an allowed number of incidents and allowed severities. It triggers — failing the gate — once there are more incidents outside the allowed severities than you permit, stopping the downstream tasks.
For installation, connection setup, and the full list of parameters for each operator and sensor, see Validio for Airflow.
Circuit Breaking with the API or SDK
If you use a different orchestrator, you can build the same circuit breaker with the Validio API or the Validio SDK:
- Trigger and track a poll. Start a manual poll and wait for it to complete. See the Manually Poll a Source and Its Validators recipe.
- Check for incidents from that poll. Query the incidents API filtered by the source and by a
createdAttime at or after the poll started, so you only count incidents that this poll produced. See Get Incidents for a Validator and Number of Incidents by Severity. - Break or continue. Apply your threshold to the result. If it is exceeded, raise an error or return a non-zero exit code so your orchestrator halts the pipeline; otherwise allow it to proceed.
A poll status of
FAILEDmeans the poll itself could not run (for example, a source error) — it does not indicate whether incidents were produced. To break on data quality issues, check for incidents after the poll succeeds, as described above.