Validio Code Scenarios
The following sections provide examples for different scenarios using Validio Code:
- Update resources–Update the configuration of a resource.
- Manage source schema–Interact with the schema to fetch Upstream schema changes or perform Manual schema override.
- Validate a set of fields–Apply the same validation to a set of fields or declare filters within a Validator.
- Destroy resources–Delete individual resources or resource definitions from your program.
- Import resources–Generate code snippets for resources that are not present in the project.
Update Resources
Resources are the configured components in your Validio environment, and includes channels, credentials, sources, validators, windows, and so on. You can update the configuration of a resource by editing the resource’s declaration in the code.
In the Author section of the Validio Code Workflow, we included a main.py
example with a 1 hour tumbling window. In the following example, we want to edit the window definition to use a tumbling window of 2 hours:
window = TumblingWindow(
name='example-window', source=big_query_source,
data_time_field='event_time',
# Updated window size from 1 to 2.
window_size=2, time_unit=WindowTimeUnit.HOUR)
Then, we invoke the plan
subcommand to display the pending update:
validio code plan
TumblingWindow 'example-window' will be updated
~ TumblingWindow = {
~ window_size = 1 => 2
~ }
Plan: 0 to create, 1 to update, 0 to delete.
Manage Source Schema
A source’s configuration includes a set of expected fields and datatypes, which are defined by the schema. Usually you do not interact with or modify the schema, because Validio automatically detects or infers the schema for all source types. However, you can intervene in the process to manually update the schema to do the following:
- Capture upstream schema changes
- Override the inferred schema with a modified version
Upstream Schema Changes
When the schema of the upstream data source changes, for example because of changes in the warehouse table, stream messages, or object storage file, it might add new fields that you would like to monitor.
You can use the --update-schema
flag to the plan or apply subcommands to fetch the latest schema.
Note
You can use multiple
--update-schema
flags for each source to check for updates, or use the--update-all-schemas
flag to check for schema updates on all sources.
The following example checks for upstream schema updates for the source and reports any changes:
validio code plan --diff=changes --update-schema example-source
No changes. Configuration is up-to-date!
Manual Schema Override
You can manually override the inferred schema with a modified version, with the following steps:
- Download the inferred schema.
- Edit the downloaded schema file.
- Pass the contents of the modified schema file as parameters to the source.
The plan or apply commands detect any change to the source’s schema whenever the contents of the file changes. For more information about interpreting and modifying the schema file, see JSON Type Definition documentation.
The following example shows how to manually override an inferred schema with a modified schema file. First, download the inferred schema:
validio sources infer-schema demo example-source --output example-source-schema.json
Then, edit the downloaded file and pass the file contents of the modified schema as a parameter to the source:
from pathlib import Path
from validio_sdk import load_jtd_schema
# Use a manually provided schema instead for the source.
big_query_source = sources.GcpBigQuerySource(
name="example-source",
credential=gcp_credential,
project='example-project',
dataset='example-dataset',
table='orders',
cursor_field='created_at',
lookback_days=30,
schedule='0 */12 * * *',
jtd_schema=load_jtd_schema(Path("/home/my-project/example-source-schema.json")),
)
Validate a Set of Fields
You can use field selectors to apply the same validation to a set of fields.
Certain Validators accept a FieldSelector
object instead of a field name when specifying which field to monitor. The field selector is then matched against the schema of the source and declares an identical validator for all matching fields.
- When you use field selectors, the name field becomes an interpolated String. Each unique validator gets a corresponding name based on the assigned field.
- You can also provide field selectors when you declare filters within a Validator. Example 2 demonstrates using field selects to declare filters in a validator.
Note
You can only attach one field selector to a validator, either where applicable in the source field or in the Filter.
For Validators that use reference Sources, you can use the FieldSelector.reference()
object whenever the reference field is the same as the compared field. Example 3 demonstrates using reference source fields.
Example 1
Specify a declaration that creates validators for all numeric fields.
from validio_sdk.resource import FieldSelector
validators.NumericValidator(
name="mean_of_%(field)s",
window=w,
segmentation=sg,
metric=NumericMetric.MEAN,
source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
)
validio code plan --diff=none
NumericValidator 'mean_of_Age' will be created
NumericValidator 'mean_of_Credit_scoring' will be created
NumericValidator 'mean_of_Number_family_members' will be created
NumericValidator 'mean_of_Working_hours_weekly' will be created
NumericValidator 'mean_of_Yearly_wage_USD' will be created
NumericValidator 'mean_of_Years_education' will be created
Plan: 6 to create, 0 to update, 0 to delete.
Example 2
Provide field selectors to declare filters within a validator. The following creates a null count
validator for all nullable fields of the type String
:
from validio_sdk.resource import FieldSelector
validators.VolumeValidator(
name="null_count_of_%(field)s",
window=w,
segmentation=sg,
metric=VolumeMetric.COUNT,
filter=NullFilter(field=FieldSelector(data_type=FieldDataType.STRING, nullable=True)),
)
validio code plan --diff=none
VolumeValidator 'null_count_of_favorite_date' will be created
VolumeValidator 'null_count_of_Occupation' will be created
Plan: 2 to create, 0 to update, 0 to delete.
Example 3
Create a numeric distribution validator that compares the value for the assigned field with value of the same field in the previous window. The validators use the same source_field
and reference_source_field
:
from validio_sdk.resource import FieldSelector
validators.NumericDistributionValidator(
name="mean_ratio_of_%(field)s",
window=w,
segmentation=sg,
metric=NumericDistributionMetric.MEAN_RATIO,
source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
reference_source_field=FieldSelector.reference(),
reference=validators.Reference(source=s1, window=w1, history=1, offset=1),
)
Destroy Resources
To delete all resources in the project's namespace, pass the --destroy
flag with either the plan
or apply
subcommand.
The following example uses plan
to preview the resources to delete:
validio code plan --diff=none --destroy
GcpCredential 'example-credential' will be deleted
GcpBigQuerySource 'example-source' will be deleted
TumblingWindow 'example-window' will be deleted
Segmentation 'example-segmentation' will be deleted
NumericValidator 'mean_of_Age' will be deleted
NumericValidator 'mean_ofCredit_scoring' will be deleted
Plan: 0 to create, 0 to update, 6 to delete.
Import Resources
The import
command adds resources to the project. Adding resources to a project means to write out Python declarations for those resources to a file in the project directory.
You can add the --import-namespace
flag to reference another namespace where resources should be imported from.
Note
Resources can exist in the project’s namespace, without being part of the project. You can run
import
without the--import-namespace
flag to add those resources to the project.
In the following example, the import
command is used to move the resources specified in resources.json
from other-namespace
into the project’s example-namespace
. The example then writes out Python declarations for those resources to the specified file, generated.py
.
validio code import -o generated.py --import-namespace other-namespace -f resources.json
GcpCredential example-credential will be imported
GcpBigQuerySource example-source will be imported
TumblingWindow example-window will be imported
Segmentation example-segmentation will be imported
NumericValidator mean_of_Credit_scoring will be imported
NumericValidator mean_of_Age will be imported
Plan: 6 resources will be imported from namespace 'other-namespace' to namespace 'example-namespace'
Do you want to perform these operations?
Only 'yes' is accepted to approve
Enter a value:
The import
command in the previous example will move an existing credential from the default namespace to the chosen namespace and generate a Python declaration for the credential. The attribute ignore_changes
, which defaults to True
, keeps the parameter values that already exist in the credential.
Important
When you run the import command, resources that already exist in the project are not added again.
Specify Resources to Import
You can use the JSON file, resources.json
, to specify which resources to import. The JSON file has the following structure:
{
"resources": {
"credentials": [
"gcp-credential",
"snowflake-credential"
],
"channels": [
"slack-alerts"
]
}
}
To keep the resources list short, only parent resources need to be specified. The command will automatically import any child resources, as long as they belong to the same namespace as the parent. For example, if you import a Source
, then its Windows
, Segmentations
, and Validators
are also imported automatically.
You can use the following type names to specify the corresponding resource type.
credentials
(automatically importssources
)sources
(automatically importswindows
,segmentations
,validators
)windows
segmentations
validators
channels
(automatically importsnotification_rules
)notification_rules
Import All Resources
You can import and change namespace on all resources in a namespace by omitting the file. This is useful if you have been setting up Validio using the UI and want to start to version control everything you have done with IaC. To do this, specify to move from the default namespace:
validio code import -o generated.py --import-namespace default
Updated 15 days ago