Best Practices

Model Selection

Choose models based on your validation complexity and cost requirements.

General Guidelines

  • Simple validation (binary checks, categorization): Use smaller, faster models
  • Complex reasoning (extraction, nuanced analysis): Use larger, more capable models or thinking models
  • Start small: Begin with faster models, and upgrade only if accuracy is insufficient

Available Models

Refer to your warehouse documentation for current model options and capabilities:


Prompt Engineering

  • Be explicit: Use clear, direct instructions for the LLM to follow
  • Request structured output: Ask for "YES/NO", binary numbers (0, 1) or floats (this will make it easy to setup a Validator Threshold, dynamic or fixed.
  • Keep prompts concise: Reduce token usage and improve response consistency

The following is an example of a good prompt:

Is "USA" a valid ISO 3166-1 country name? Answer only YES or NO.

Performance and Cost

  • Batch processing: AI functions process data during source polling based on your schedule. Use tumbling windows where possible to only process the rows in the new window and avoid reprocessing the whole table.
  • Choose appropriate windows: Balance how often you want to validate your unstructured data with compute costs
  • Monitor token usage: Track costs and model token usage through your warehouse's billing dashboards
  • Test on samples: Validate queries on small datasets before full deployment

Development Workflow

  1. Start with a Custom SQL source: Easier to test and debug AI queries. Alternativly, prototype the AI function call outside Validio in your IDE or data warehouse GUI.
  2. Test prompts thoroughly: Check LLM responses so that they are consistent and accurate
  3. Begin with simple validators: Use Volume/Count validators on top of a well defined LLM generated field on your SQL Source (numeric scale for example)