Documentation
HomeRequest DemoContact

Validio Code Scenarios

The following sections provide examples for different scenarios using Validio Code:

  • Update resources–Update the configuration of a resource.
  • Manage source schema–Interact with the schema to fetch Upstream schema changes or perform Manual schema override.
  • Validate a set of fields–Apply the same validation to a set of fields or declare filters within a Validator.
  • Destroy resources–Delete individual resources or resource definitions from your program.
  • Import resources–Generate code snippets for resources that are not present in the project.

Update Resources

Resources are the configured components in your Validio environment, and includes channels, credentials, sources, validators, windows, and so on. You can update the configuration of a resource by editing the resource’s declaration in the code.

In the Author section of the Validio Code Workflow, we included a main.py example with a 1 hour tumbling window. In the following example, we want to edit the window definition to use a tumbling window of 2 hours:

window = TumblingWindow(
    name='example-window', source=big_query_source,
    data_time_field='event_time',
    # Updated window size from 1 to 2.
    window_size=2, time_unit=WindowTimeUnit.HOUR)

Then, we invoke the plan subcommand to display the pending update:

validio code plan

TumblingWindow 'example-window' will be updated
~ TumblingWindow = {
~   window_size = 1 => 2
~ }


Plan: 0 to create, 1 to update, 0 to delete.

Manage Source Schema

A source’s configuration includes a set of expected fields and datatypes, which are defined by the schema. Usually you do not interact with or modify the schema, because Validio automatically detects or infers the schema for all source types. However, you can intervene in the process to manually update the schema to do the following:

  • Capture upstream schema changes
  • Override the inferred schema with a modified version

Upstream Schema Changes

When the schema of the upstream data source changes, for example because of changes in the warehouse table, stream messages, or object storage file, it might add new fields that you would like to monitor.

You can use the --update-schema flag to the plan or apply subcommands to fetch the latest schema.

📘

Note

You can use multiple --update-schema flags for each source to check for updates, or use the --update-all-schemas flag to check for schema updates on all sources.

The following example checks for upstream schema updates for the source and reports any changes:

validio code plan --diff=changes --update-schema example-source

No changes. Configuration is up-to-date!

Manual Schema Override

You can manually override the inferred schema with a modified version, with the following steps:

  1. Download the inferred schema.
  2. Edit the downloaded schema file.
  3. Pass the contents of the modified schema file as parameters to the source.

The plan or apply commands detect any change to the source’s schema whenever the contents of the file changes. For more information about interpreting and modifying the schema file, see JSON Type Definition documentation.

The following example shows how to manually override an inferred schema with a modified schema file. First, download the inferred schema:

validio sources infer-schema demo example-source --output example-source-schema.json

Then, edit the downloaded file and pass the file contents of the modified schema as a parameter to the source:

from pathlib import Path
from validio_sdk import load_jtd_schema

# Use a manually provided schema instead for the source.

big_query_source = sources.GcpBigQuerySource(
   name="example-source",
   credential=gcp_credential,
   project='example-project',
   dataset='example-dataset',
   table='orders',
   cursor_field='created_at',
   lookback_days=30,
   schedule='0 */12 * * *',
   jtd_schema=load_jtd_schema(Path("/home/my-project/example-source-schema.json")),
)

Validate a Set of Fields

You can use field selectors to apply the same validation to a set of fields.

Certain Validators accept a FieldSelector object instead of a field name when specifying which field to monitor. The field selector is then matched against the schema of the source and declares an identical validator for all matching fields.

  • When you use field selectors, the name field becomes an interpolated String. Each unique validator gets a corresponding name based on the assigned field.
  • You can also provide field selectors when you declare filters within a Validator. Example 2 demonstrates using field selects to declare filters in a validator.

📘

Note

You can only attach one field selector to a validator, either where applicable in the source field or in the Filter.

For Validators that use reference Sources, you can use the FieldSelector.reference() object whenever the reference field is the same as the compared field. Example 3 demonstrates using reference source fields.

Example 1

Specify a declaration that creates validators for all numeric fields.

from validio_sdk.resource import FieldSelector
validators.NumericValidator(
    name="mean_of_%(field)s",
    window=w,
    segmentation=sg,
    metric=NumericMetric.MEAN,
    source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
)
validio code plan --diff=none
NumericValidator 'mean_of_Age' will be created
NumericValidator 'mean_of_Credit_scoring' will be created
NumericValidator 'mean_of_Number_family_members' will be created
NumericValidator 'mean_of_Working_hours_weekly' will be created
NumericValidator 'mean_of_Yearly_wage_USD' will be created
NumericValidator 'mean_of_Years_education' will be created

Plan: 6 to create, 0 to update, 0 to delete.

Example 2

Provide field selectors to declare filters within a validator. The following creates a null count validator for all nullable fields of the type String:

from validio_sdk.resource import FieldSelector
validators.VolumeValidator(
    name="null_count_of_%(field)s",
    window=w,
    segmentation=sg,
    metric=VolumeMetric.COUNT,
filter=NullFilter(field=FieldSelector(data_type=FieldDataType.STRING, nullable=True)),
)
validio code plan --diff=none
VolumeValidator 'null_count_of_favorite_date' will be created
VolumeValidator 'null_count_of_Occupation' will be created

Plan: 2 to create, 0 to update, 0 to delete.

Example 3

Create a numeric distribution validator that compares the value for the assigned field with value of the same field in the previous window. The validators use the same source_field and reference_source_field:

from validio_sdk.resource import FieldSelector
validators.NumericDistributionValidator(
    name="mean_ratio_of_%(field)s",
    window=w,
    segmentation=sg,
    metric=NumericDistributionMetric.MEAN_RATIO,
    source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
    reference_source_field=FieldSelector.reference(),
    reference=validators.Reference(source=s1, window=w1, history=1, offset=1),
)

Destroy Resources

To delete all resources in the project's namespace, pass the --destroy flag with either the plan or apply subcommand.

The following example uses plan to preview the resources to delete:

validio code plan --diff=none --destroy

GcpCredential 'example-credential' will be deleted
GcpBigQuerySource 'example-source' will be deleted
TumblingWindow 'example-window' will be deleted
Segmentation 'example-segmentation' will be deleted
NumericValidator 'mean_of_Age' will be deleted
NumericValidator 'mean_ofCredit_scoring' will be deleted

Plan: 0 to create, 0 to update, 6 to delete.

Import Resources

The import command adds resources to the project. Adding resources to a project means to write out Python declarations for those resources to a file in the project directory.

You can add the --import-namespace flag to reference another namespace where resources should be imported from.

📘

Note

Resources can exist in the project’s namespace, without being part of the project. You can run import without the --import-namespace flag to add those resources to the project.

In the following example, the import command is used to move the resources specified in resources.json from other-namespace into the project’s example-namespace. The example then writes out Python declarations for those resources to the specified file, generated.py.

validio code import -o generated.py --import-namespace other-namespace -f resources.json

GcpCredential example-credential will be imported  
GcpBigQuerySource example-source will be imported  
TumblingWindow example-window will be imported  
Segmentation example-segmentation will be imported  
NumericValidator mean_of_Credit_scoring will be imported  
NumericValidator mean_of_Age will be imported

 Plan: 6 resources will be imported from namespace 'other-namespace' to namespace 'example-namespace'

Do you want to perform these operations?  
        Only 'yes' is accepted to approve  
Enter a value:

The import command in the previous example will move an existing credential from the default namespace to the chosen namespace and generate a Python declaration for the credential. The attribute ignore_changes, which defaults to True, keeps the parameter values that already exist in the credential.

❗️

Important

When you run the import command, resources that already exist in the project are not added again.

Specify Resources to Import

You can use the JSON file, resources.json, to specify which resources to import. The JSON file has the following structure:

{
  "resources": {
    "credentials": [
      "gcp-credential",
      "snowflake-credential"
    ],
    "channels": [
      "slack-alerts"
    ]
  }
}

To keep the resources list short, only parent resources need to be specified. The command will automatically import any child resources, as long as they belong to the same namespace as the parent. For example, if you import a Source, then its Windows, Segmentations, and Validators are also imported automatically.

You can use the following type names to specify the corresponding resource type.

  • credentials (automatically imports sources)
  • sources (automatically imports windows, segmentations, validators)
  • windows
  • segmentations
  • validators
  • channels (automatically imports notification_rules)
  • notification_rules

Import All Resources

You can import and change namespace on all resources in a namespace by omitting the file. This is useful if you have been setting up Validio using the UI and want to start to version control everything you have done with IaC. To do this, specify to move from the default namespace:

validio code import -o generated.py --import-namespace default