Best Practices

Best practices for model selection, prompt design, and cost management when validating MDM data with Validio.

Model Selection

Choose models based on your validation complexity and cost requirements.

General Guidelines

  • Simple validation (binary checks, categorization): Use smaller, faster models
  • Complex reasoning (extraction, nuanced analysis): Use larger, more capable models or thinking models
  • Start small: Begin with faster models, and upgrade only if accuracy is insufficient

Available Models

Refer to your warehouse documentation for current model options and capabilities:


Prompt Engineering

  • Be explicit: Use clear, direct instructions for the LLM to follow
  • Request structured output: Ask for "YES/NO", binary numbers (0, 1) or floats (this will make it easy to setup a Validator Threshold, dynamic or fixed.
  • Keep prompts concise: Reduce token usage and improve response consistency

The following is an example of a good prompt:

Is "USA" a valid ISO 3166-1 country name? Answer only YES or NO.

Performance and Cost

  • Batch processing: AI functions process data during source polling based on your schedule. Use tumbling windows where possible to only process the rows in the new window and avoid reprocessing the whole table.
  • Choose appropriate windows: Balance how often you want to validate your unstructured data with compute costs
  • Monitor token usage: Track costs and model token usage through your warehouse's billing dashboards
  • Test on samples: Validate queries on small datasets before full deployment

Development Workflow

  1. Start with a Custom SQL source: Easier to test and debug AI queries. Alternativly, prototype the AI function call outside Validio in your IDE or data warehouse GUI.
  2. Test prompts thoroughly: Check LLM responses so that they are consistent and accurate
  3. Begin with simple validators: Use Volume/Count validators on top of a well defined LLM generated field on your SQL Source (numeric scale for example)