Data Stream Sources
Validio supports integrating with the Data Streams that modern data teams work with today.
General considerations
Consider the following when you read and validate records from a Data stream:
- In each stream, all messages must be consistent with the declared schema:
- Fields added after creating the connector are ignored.
- Missing fields are interpreted as empty fields, which have consequences on the analytics involving those fields.
Cost and performance
The costs associated with reading data from Data streams:
- Each Source in Validio corresponds to one consumer of your Data stream.
- If the traffic crosses cloud regions, there are potential network costs between Validio and the Data stream.
Schema inference
Based on your data, Validio helps infer a schema for your source. The schema inference requires that the Data stream is not empty when you connect your Source.
The Data stream can be considered empty by the Validio platform when:
- The Data steam has no events when connecting to Validio. For example, all events are deleted according to the retention period, and no new events have been published.
Timestamps
- Validio infers fields with the
Timestamp
datatype in the schema. These suggests, for example, when a event or message is created or published in the Data stream. - For Pub/sub and Pub/sub lite, Validio also infers the
validio_publish_time
field in the schema. Thevalidio_publish_time
field contains the timestamp that Pub/sub generates when a message is published to the stream. The timestamp is in RFC3339 UTC "Zulu" format.
Updated 8 months ago