Data Stream Sources

Validio supports integrating with the Data Streams that modern data teams work with today.

General considerations

Consider the following when you read and validate records from a Data stream:

  • In each stream, all messages must be consistent with the declared schema:
    1. Fields added after creating the connector are ignored.
    2. Missing fields are interpreted as empty fields, which have consequences on the analytics involving those fields.

Cost and performance

The costs associated with reading data from Data streams:

  • Each Source in Validio corresponds to one consumer of your Data stream.
  • If the traffic crosses cloud regions, there are potential network costs between Validio and the Data stream.

Schema inference

Based on your data, Validio helps infer a schema for your source. The schema inference requires that the Data stream is not empty when you connect your Source.

The Data stream can be considered empty by the Validio platform when:

  • The Data steam has no events when connecting to Validio. For example, all events are deleted according to the retention period, and no new events have been published.


  • Validio infers fields with the Timestamp datatype in the schema. These suggests, for example, when a event or message is created or published in the Data stream.
  • For Pub/sub and Pub/sub lite, Validio also infers the validio_publish_time field in the schema. The validio_publish_time field contains the timestamp that Pub/sub generates when a message is published to the stream. The timestamp is in RFC3339 UTC "Zulu" format.