How to stream into Google BigQuery from an Amazon Athena query

In Amazon Athena:

First we need to get the schema of our source table or query. I’m doing a select * from my source table so I can use:

show create table headeraltpartitions4;

We need the schema to create the table in Google BigQuery (if it doesn’t exist already)

In Google BigQuery, we’ll modify the above statement to create a create table statement. Note the partitioning schema ie by ingestion date see: https://cloud.google.com/bigquery/docs/creating-partitioned-tables#sql_1

create table if not exists fromAthena.headeralpartitions4a
(
  `f1` string, 
  `f2` string, 
  `f4` string, 
  `f5` string, 
  `f6` string, 
  `f7` string, 
  `f8` string, 
  `f9` string, 
  `f10` string, 
  `f11` string, 
  `f12` string, 
  `f13` string, 
  `f14` string, 
  `f15` string, 
  `f16` string, 
  `f17` string, 
  `f18` string, 
  `f19` string, 
  `f20` string, 
  `f21` string, 
  `f22` string, 
  `f23` string, 
  `f24` string, 
  `f25` string, 
  `f26` string, 
  `f27` string, 
  `f28` string, 
  `f29` string, 
  `f30` string, 
  `f31` string, 
  `f32` string, 
  `f33` string, 
  `f34` string,
  `f3` string, 
  `year` string, 
  `month` string)
  PARTITION BY
  _PARTITIONDATE

In the Google Cloud Platform (GCP) BigQuery console:

https://console.cloud.google.com/bigquery

We now have our empty table:

We can now create our job:

Since we are running our job from Amazon EC2 we need to download and save our service account key from GCP and upload it to S3.

This is essential for running GCP jobs.

These fields allow us to specify the location of our GCP service account key:

This is how one query file might look:

The “BigQuery destination table” is specified in the following format: projectId:datasetName.tableName

These fields also allow us to specify multiple Athena queries to run.

First we upload our query files to S3:

Each file needs to house a different query. The prefix and suffix must be the same for every file. Only the middle variable can differ.

This is how multiple query files could look:

Notice the pipe ie: “|” delimited list above.

Then we also pipe delimit our BigQuery destination tables:

The order of the “BigQuery destination tables” must mirror or agree with the query files specified in the “Middle variable” field.

Once the job runs, we can see the log entries indicating the streaming into BigQuery from our Athena query result:

If we inspect our tables in BigQuery, we can see the rows in the streaming buffer: