Data Sampling

Preview a tabulated sample of rows from your source to inspect actual data values and verify data patterns.

Data sampling in Validio displays a tabulated preview of rows from your source or catalog asset, letting you inspect actual data values and verify data patterns. You can also use Data Profiling to generate statistical analysis of your fields.

Data sampling configuration and preview

Running a Sample

To sample data from your source or catalog asset:

  1. Navigate to the Schema & Profiling tab.

  2. Click Sample data and configure the sample size and fields in the dialog:

    • Sampling Percentage: (Only available for tables and materialized views.) Use the slider or enter a number to adjust the percentage of data to include in the sample. Lower values speed up sampling on large tables. Defaults to 10%.

    • Data Scope: (Recommended) Cap the execution time on large datasets by limiting the fields and number of rows sampled:

      SettingDescription
      FieldsSelect specific fields to include in the sample. By default, all fields are included.
      Row limitSet the maximum number of rows to include in the sample. Defaults to 50 rows.
    • Lookback Filter: (Recommended) Limit the data volume scanned by targeting only a specific window of recent data:

      SettingDescription
      Lookback fieldSelect a timestamp or date field to define the sampling window. This limits the sample to recent data rather than scanning the full dataset.
      Lookback durationThe specific amount of time to look back when filtering data. Use this with the Lookback field to sample only recent data.
    • Credential (Only for catalog assets): Select a credential to use for displaying sample data.

    • The dialog includes a SQL query that is auto-generated from your settings. You can edit this query directly for more control, but note that manual edits are not synced back to the settings above. Click Reset query to discard any manual edits and regenerate the query from the current settings.

  3. Click Load samples to display the results.

    • To cancel a sampling run (for example, if sampling takes too long to complete), click Cancel and adjust the settings.

Use Cases

Data Sampling is useful for:

  • Spot-checking that actual values match the detected schema and data types.
  • Verifying data patterns before configuring validators.
  • Investigating unexpected values or formats flagged during profiling or validation.