Data Handling and Retention

Understand how Validio accesses and processes data, and review retention policies for managed and VPC deployments.

This article discusses the types of data that Validio can access and process, as well as the retention period for each type of data.

Deployment and Data Management

Validio provides two primary deployment options designed to offer complete flexibility over where your data is stored and processed. Both models allow you to select specific regions (GCP, AWS, or Azure) to ensure compliance with local data residency requirements.

The following table compares the data management based on the deployment options:

FeatureValidio Managed SolutionCustomer Virtual Private Cloud (VPC)
HostingHosted by Validio in your chosen cloud region.Installed and managed within your own infrastructure.
Data AccessValidio processes metadata and managed retention.Validio has no access to raw data or logs.
Data RetentionStandard retention periods apply to processed data. See Data Retention PoliciesCustomer-managed; no backup provided by Validio.
ControlHigh ease of use with regional compliance.Full customer control over entire environment.

For more information about Validio deployments, see Validio Managed Solution and Validio VPC Deployment. For more information about how Validio protects your data, see Security and Compliance.

How Validio Handles Data in Push-down

For data warehouse and query engine sources, Validio uses query push-down: each check is compiled into a query that runs inside your warehouse, and only the computed result is returned. Validio does not copy your tables out, and for validators, Custom SQL validators, and data profiling, no raw rows are received or stored — Validio keeps only the resulting metric (for example, the value, the threshold bounds, and whether the check breached). The query travels to your warehouse over an encrypted connection when transport encryption is enabled on the connection; see Encryption in Transit below.

Validio's datastore holds computed results, not source data. Two features intentionally store values derived from your data (listed here so you can decide which assets to use them on):

  • Data profiling stores a small, bounded set of statistics. Most are counts and percentages, but numeric minimum, maximum, median, and quartiles, and timestamp minimum and maximum are actual values present in the column. Text and boolean columns store no values. Assets too sensitive for even these boundary statistics need not be profiled. See Data Profiling.
  • Segmentation stores the distinct values of any field you segment on, as grouping labels (for example, region = Nordics). These are derived from your data, so segment on low-sensitivity fields. See About Segmentation.

A few features deliberately display raw rows so you can inspect them, but this data is transient (it is fetched from your warehouse, shown for inspection, and never stored or logged by Validio):

  • Data sampling returns real rows as a preview, held in memory for display only and discarded when you close or reload it. See Data Sampling.
  • Debug generates a query that reproduces an incident and can load sample rows from the affected data. The query runs against your warehouse and the results are displayed for troubleshooting, but they are not retained. See Debugging an Incident.
  • Filter preview shows sample rows that match a filter while you configure it, so you can confirm the filter behaves as intended. See About Filters.
  • Custom SQL source preview shows sample rows returned by the query while you set up or edit a custom SQL source. See Custom SQL Sources.
  • Custom SQL validator query testing shows the rows a validator query returns while you test it, so you can confirm it returns the expected single metric before saving. See Custom SQL Validators.

In every case the rows are fetched from your warehouse, displayed for inspection, and never written to Validio storage.

Encryption in Transit

Connections to data warehouse and transactional database sources support authenticated TLS, validated against a certificate authority (CA) you supply. Transport encryption is configured per connection on the credential — for example, on the Oracle, SAP IQ, and Progress OpenEdge credentials. We recommend enabling TLS on every connection and supplying a CA certificate for validation. For an overview of Validio's security controls, see Security and Compliance.

Data Retention Policies

The level of data access and the applicable retention policies depend heavily on your setup:

  • Managed Solution: Validio manages the retention periods for the data types processed within the platform.
  • VPC Deployment: Because Validio does not have access to your raw data or logs in a VPC environment, we do not provide a backup solution for your installation. Data retention in this model is opt-in only and applies exclusively if you choose to send performance metrics and error logs to Validio for troubleshooting and analysis.

The following table lists the types of data Validio processes and their retention period depending on your deployment.

Type of DataValidio Managed SolutionValidio VPC DeploymentRetention Period (In Validio)
Raw Data from streaming sources Validio ingests raw data from streaming sources.YN1 hour after the data is processed
Raw Data from data warehouse sources Validio ingests aggregate metrics from data warehouse sources. Raw rows are only fetched when a feature displays them for inspection — for example, data sampling, incident debugging, filter preview, custom SQL source preview, and custom SQL validator query testing. In every case the query runs against the warehouse and the results are displayed for inspection, but the data is not retained.NNN/A
Query LogsYN30 days
Logs, metrics, and traces This data includes performance metrics and application error logs.YY160 days
Daily backups of environments. The backups include configuration and calculated metrics.YN90 days
Anonymized calculated metricsYNDepends on window and segment configuration.

1 For Validio VPC deployments, you can opt-out from sending performance metrics and error logs to Validio.

Manually Exporting Data

The Validio API allows complete flexibility in exporting all information in Validio (including configurations, audit logs, data quality history) to other systems (such as catalogs, BI tools, and data warehouses). Validio also provides SDK recipes with pre-written code for exporting metrics or incident groups to CSV. Refer to the SDK Recipes and Validio API Documentation.