Pre-requisites

You will need:

  • A service account with permissions to access the specified dataset and table
  • A base64 encoded service account key in JSON format

Service account

It is recommended you create a service account with read-only access for the Validio platform to ingest data from your BigQuery table. Details of these permissions and roles can be found on GCP here.

The service account needs to be assigned the following roles:

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery Job User (roles/bigquery.jobUser)

Service account key

  • Obtain a service account key in JSON form for the service account. GCP instructions can be found here.
  • Encode the service account key in base64

BigQuery configuration parameters

Description of the fields that can be configured when setting up a BigQuery connector:

Field Required Notes Example
Name Identifier for the connector. Used when setting up pipelines East_coast_weather_forecast
Credentials Base64 encoded form of the service account key in the JSON format
Project id Name of BigQuery project weather-forecast
Dataset id Name of the dataset where the table is included east-coast
Table name Name of the table in the dataset to ingest data from. You can find a whole ID of the table including project ID, dataset ID and table name in the details page of the table in the GCP console train-data
Incremental column name The name of the column that will help the Validio platform identify and determine what records have not been read already. This can be an auto-incrementing column of type integer or a datetime/timestamp updated_timestamp
Polling interval value How often to query the database for new data. This value is combined with the unit in order to create a polling interval (e.g. a polling interval value of 2 with a unit of “hours” will poll every two hours) 2
Unit The time unit used for the polling interval value Hour

Miscellaneous

  • RECORD data types and REPEATABLE columns are encoded as STRING when processing.