Validio Code (IaC)
Validio Code allows you to model and manage resources within your environment in an efficient and repeatable manner.
Validio Code provides Infrastructure-as-Code, which is similar to other IaC tools, such as Terraform, Pulumi, and AWS CloudFormation.
Validio Code leverages Python to provide the full expressive power of an imperative programming language, combined with the safety of declaratively managing your infrastructure. This fits Validio Code into existing development or CI/CD workflows.
Getting started with Validio Code
Before you can use Validio Code you must install and configure the Validio CLI and SDK
In this guide, we describe the typical workflow in Validio Code:
- Initialize: You create a Validio Code project.
- Author: Write or modify your program to describe the desired resources in your environment.
- Plan: Preview the changes made to your environment.
- Apply: When you are satisfied with the program, deploy the changes to your environment.
We also describe the following:
- Update resources: Update the configuration of a resource.
- Destroy: Delete individual resources or resource definitions from your program.
- Manage source schema: Interact with the schema to fetch Upstream schema changes or perform Manual schema override.
- Field selectors: Apply the same validation to a set of fields or declare filters within a Validator.
- Import resources: Generate code snippets for resources that are not present in the project.
Initialize
To manage resources through code, start by scaffolding a project. Use the init
subcommand to create a new project. A project is essentially a directory housing the Python program responsible for managing your resources.
In this example, we create a project in a directory called my-project
with a namespace called my-namespace
:
validio code init --directory my-project --namespace my-namespace
There are no special considerations for the initialized directory. It contains a bare bones main.py
and requirements.txt
.
Note that you can run the
init
command directly from the desired directory and omit the--directory
flag.
As with any Python project, you can introduce any Python tools, modules, dependencies, IDE, etc.
Namespace
To assign a namespace when initializing a project, add the optional -n
or --namespace
flag to the init
subcommand. We recommend that you always assign a unique namespace to every project.
For more information about namespaces, refer to Validio API.
Author
Use the definitions for all resource types provided by the Validio SDK to describe the desired resources for your program.
Create an instance of the class definition for the Resource type you want to declare, to deploy it in your environment.
If you want to delete individual resource definitions from your program, you can either remove or comment out the related code.
For maintenance purposes, we encourage you to divide your resource declarations and helper functionality across different modules, as you would in any program.
You can use any python based tools at your disposal when you author a program. For example, reuse modules, import external dependencies, and load credentials from your preferred secrets manager.
Environment variables passed to the plan and apply command are visible to the Validio Code program.
This example shows the contents of a working main.py file. The main.py file consists of a full program that declares which resources to deploy into an environment:
from validio_sdk.resource.credentials import GcpCredential
from validio_sdk.resource.sources import GcpBigQuerySource
from validio_sdk.resource.segmentations import Segmentation
from validio_sdk.resource.windows import TumblingWindow
from validio_sdk.resource.validators import NumericValidator
from validio_sdk.graphql_client.enums import WindowTimeUnit, NumericMetric
# Declare a GCP Credential
gcp_credential = GcpCredential(name='example-credential', credential='<svc-acct>')
# Declare a BigQuery Source
big_query_source = GcpBigQuerySource(
name='example-source',
credential=gcp_credential,
project='example-project',
dataset='example-dataset',
table='orders',
cursor_field='created_at',
lookback_days=30,
schedule='0 */12 * * *',
)
# Attach a 1 hour Tumbling window to the source
window = TumblingWindow(
name='example-window', source=big_query_source,
data_time_field='event_time',
window_size=1, time_unit=WindowTimeUnit.HOUR)
# Attach a segmentation to the source
by_gender = Segmentation(name='example-segmentation', source=big_query_source, fields=['Gender'])
# Set up a couple validators on the source
for field in ['Age', 'Credit_scoring']:
NumericValidator(
name=f'mean_of_{field}',
window=window, segmentation=by_gender,
source_field=field, metric=NumericMetric.MEAN)
Plan
Use the Plan subcommand to preview changes in your environment, according to your program definition. This shows you any new resources to be created, updated, or deleted. You can iterate between the Author and Plan steps until you are satisfied with the changes.
You can invoke the plan subcommand when you want feedback on your work:
validio code plan
If you are not running the plan command from your project’s root directory, you must specify the
--directory
flag.
You can use the
--diff
flag to control whether you want a summary, a full diff, or a partial diff that only shows the changes to be made.
In this example, the output from the plan
subcommand describes that our program adds 6 resources to our environment:
validio code plan --diff=none
GcpCredential 'example-credential' will be created
GcpBigQuerySource 'example-source' will be created
TumblingWindow 'example-window' will be created
Segmentation 'example-segmentation' will be created
NumericValidator 'mean_of_Age' will be created
NumericValidator 'mean_of_Credit_scoring' will be created
Plan: 6 to create, 0 to update, 0 to delete.
Apply
Use the apply subcommand to accept and perform the planned changes to your environment. The apply
subcommand also performs the Plan step, which gives you a final look at the planned changes.
validio code apply
Explicit confirmation step
Apply has an explicit confirmation step because it is a potentially destructive operation.
You can pass the
--auto-approve
flag to skip the confirmation step and apply the planned changes immediately. This is useful in, for example, CI/CD workflows where you want to apply changes that are automatically merged to a branch.
In this example, we get the output from the ‘plan’ subcommand, followed by a request for confirmation, and finally a confirmation message:
validio code apply --diff=none
GcpCredential 'example-credential' will be created
GcpBigQuerySource 'example-source' will be created
TumblingWindow 'example-window' will be created
Segmentation 'example-segmentation' will be created
NumericValidator 'mean_of_Age' will be created
NumericValidator 'mean_ofCredit_scoring' will be created
Plan: 6 to create, 0 to update, 0 to delete.
Do you want to perform these operations?
Only 'yes' will be accepted to approve
Enter a value: yes
...
Apply complete! Resources: 6 created, 0 updated, 0 deleted
Update resources
In the resource's declaration in the program, you can update the configuration of a resource.
In this example, we want to use a Tumbling window of 2 hours, instead of the 1 hour as declared in our previous example. In the Window definition, we change the following:
window = TumblingWindow(
name='example-window', source=big_query_source,
data_time_field='event_time',
# Updated window size from 1 to 2.
window_size=2, time_unit=WindowTimeUnit.HOUR)
Then, we invoke the plan subcommand which shows us the pending update:
validio code plan
TumblingWindow 'example-window' will be updated
~ TumblingWindow = {
~ window_size = 1 => 2
~ }
Plan: 0 to create, 1 to update, 0 to delete.
Destroy
If you want to delete all resources in the project's namespace, you can pass the --destroy
flag with either the plan or apply subcommand.
In this example, we pass the --destroy flag to the plan subcommand:
validio code plan --diff=none --destroy
GcpCredential 'example-credential' will be deleted
GcpBigQuerySource 'example-source' will be deleted
TumblingWindow 'example-window' will be deleted
Segmentation 'example-segmentation' will be deleted
NumericValidator 'mean_of_Age' will be deleted
NumericValidator 'mean_ofCredit_scoring' will be deleted
Plan: 0 to create, 0 to update, 6 to delete.
Manage Source schema
A Source's configuration includes a set of fields and datatypes expected for that source. This is defined by the schema.
Usually, you do not need to interact with or modify the schema, since the Validio platform automatically detects or infers schema for all source types. However, you can intervene in this process in one of the following ways:
- Upstream schema changes
- Manual schema override
Upstream schema changes
When the schema of the upstream data source changes, it might add new fields that you would like to monitor. For example, because of changes in the warehouse table, stream messages, or object storage file.
Pass the --update-schema
flag to the plan or apply subcommand to fetch the latest schema.
You can pass multiple
--update-schema
flags for each source to check for updates.Alternatively, you can pass the
--update-all-schemas
to check all sources for schema updates.
In this example, we check for upstream schema updates for the source and report any changes:
validio code plan --diff=changes --update-schema example-source
No changes. Configuration is up-to-date!
Manual schema override
If you want to manually override the inferred schema with a modified version, you can:
- Download the inferred schema.
- Make the desired adjustments to the existing schema file.
- Pass the contents of the modified schema file as parameters to the source.
With this approach, the plan or apply commands detect any changes to the source's schema whenever the file contents change.
For more information on how to interpret and modify the file, refer to JSON Type Definition.
In this example, we base the new schema off an inferred schema. First we download the inferred schema file:
validio sources infer-schema demo example-source --output example-source-schema.json
We update the downloaded file and pass the file contents of the mutated schema as parameter to the source:
from pathlib import Path
from validio_sdk import load_jtd_schema
# Use a manually provided schema instead for the source.
big_query_source = sources.GcpBigQuerySource(
name="example-source",
credential=gcp_credential,
project='example-project',
dataset='example-dataset',
table='orders',
cursor_field='created_at',
lookback_days=30,
schedule='0 */12 * * *',
jtd_schema=load_jtd_schema(Path("/home/my-project/example-source-schema.json")),
)
Field selectors
You can use field selectors as a shorthand to apply the same validation to a set of fields.
Certain Validators accept a FieldSelector
object instead of a field name when describing what field to monitor. The field selector is then matched against the schema of the Source and declares an identical Validator for all matching fields.
When you use field selectors, the name field becomes an interpolated String. Each unique Validator gets a corresponding name based on the assigned field.
You can also provide field selectors when you declare filters within a Validator.
You can only attach one field selector to a Validator, either where applicable in the Source field or in the Filter.
For Validators that use reference Sources, you can use the shorthand FieldSelector.reference()
whenever the reference field is the same as the compared field.
Example 1
In this example, we specify a declaration which creates Validators for all numeric fields:
from validio_sdk.resource import FieldSelector
validators.NumericValidator(
name="mean_of_%(field)s",
window=w,
segmentation=sg,
metric=NumericMetric.MEAN,
source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
)
validio code plan --diff=none
NumericValidator 'mean_of_Age' will be created
NumericValidator 'mean_of_Credit_scoring' will be created
NumericValidator 'mean_of_Number_family_members' will be created
NumericValidator 'mean_of_Working_hours_weekly' will be created
NumericValidator 'mean_of_Yearly_wage_USD' will be created
NumericValidator 'mean_of_Years_education' will be created
Plan: 6 to create, 0 to update, 0 to delete.
Example 2
In this example, we provide field selectors when we declare filters within a Validator. The following creates a null count
Validator for all nullable fields of the type String
:
from validio_sdk.resource import FieldSelector
validators.VolumeValidator(
name="null_count_of_%(field)s",
window=w,
segmentation=sg,
metric=VolumeMetric.COUNT,
filter=NullFilter(field=FieldSelector(data_type=FieldDataType.STRING, nullable=True)),
)
validio code plan --diff=none
VolumeValidator 'null_count_of_favorite_date' will be created
VolumeValidator 'null_count_of_Occupation' will be created
Plan: 2 to create, 0 to update, 0 to delete.
Example 3
In this example, we create numeric distribution Validators that each compare the value for the assigned field with the value of that same field in the previous window. The Validators use the same source_field
and reference_source_field
:
from validio_sdk.resource import FieldSelector
validators.NumericDistributionValidator(
name="mean_ratio_of_%(field)s",
window=w,
segmentation=sg,
metric=NumericDistributionMetric.MEAN_RATIO,
source_field=FieldSelector(data_type=FieldDataType.NUMERIC),
reference_source_field=FieldSelector.reference(),
reference=validators.Reference(source=s1, window=w1, history=1, offset=1),
)
Import resources
The import
command adds resources to the project. You can add the --import-namespace
flag, to reference another namespace where resources should be imported from.
Adding resources to a project essentially means to write out python declarations for those resources, to a file in the project directory.
Example of importing resources from another namespace called other-namespace
:
validio code import -o generated.py --import-namespace other-namespace -f resources.json
GcpCredential example-credential will be imported
GcpBigQuerySource example-source will be imported
TumblingWindow example-window will be imported
Segmentation example-segmentation will be imported
NumericValidator mean_of_Credit_scoring will be imported
NumericValidator mean_of_Age will be imported
Plan: 6 resources will be imported from namespace 'other-namespace' to namespace 'example-namespace'
Do you want to perform these operations?
Only 'yes' is accepted to approve
Enter a value:
The above command performs the following operations:
- Move the resources specified in
resources.json
from namespaceother-namespace
into the project's namespaceexample-namespace
. - Write out python declarations for those resources to the specified file
generated.py
.
Resources can exist in the project's namespace, without being part of the project. You can run
import
without the--import-namespace
flag, to add those resources to the project.
Note that resources that already exist in the project are not added again.
Specify resources to import
As seen in the example above, a JSON file, resources.json
, is used to specify which resources to import. The JSON file has the following structure:
{
"resources": {
"credentials": [
"gcp-credential",
"snowflake-credential"
],
"channels": [
"slack-alerts"
]
}
}
To keep the resources list short, only parent resources need to be specified. The command will automatically import any child resources, as long as they belong to the same namespace as the parent.
For example, if you import a Source
, then its Windows
, Segmentations
, and Validators
are also imported automatically.
Resource type names
The following type names are supported for the corresponding resource type.
credentials
(automatically importssources
)sources
(automatically importswindows
,segmentations
,validators
)windows
segmentations
validators
channels
(automatically importsnotification_rules
)notification_rules
Import all resources
You can import, and change namespace, on all resources in a namespace, by omitting the file. This is particularly useful if you have been setting up Validio through the UI and want to start to version control everything you have done with IaC. Just specify to move from the default namespace.
validio code import -o generated.py --import-namespace default
Updated 6 months ago