Athena Dynamic SQL

This job type allows you to build and run SQL statements dynamically. This is extremely powerful because it allows you to remove the monotony of creating many SQL statements.

“Run Time” is the Time in UTC time zone when you want the job to run.

“Number of Runs per day” is the number of times per day you want the job to run. This could be anywhere from 1 to 1440 times per day (every minute is currently the maximum)

Fill in the “Email address” to be receive job run notifications by email. This only works if you’ve signed up for SendGrid and completed the details in the SendGrid section.

“Pipeline Name” groups jobs together to run consecutively and dependently. This means subsequent job steps eg: Step 2 will only run once Step 1 has completed successfully. To change the Pipeline that the job belongs to, simply change the Pipeline Name field.

“Region” is the AWS region where you want the query to run.

“Job Name” is the unique name of the job. It can’t exist even in a different pipeline.

“Dynamic Column Names”

The pipe ie: | separated list of column names from the parent SQL statement that are used in the child SQL statement eg: mycolumn1|mycolumn2

“Parent SQL Statement”

The location in S3 of the Parent Athena SQL Statement (bucketname and path excluding the s3:// prefix) 

The column names specified in Dynamic Column Names must exist in the SELECT clause in this SQL statement

“Child SQL Statement”

The path to the file in S3 containing the child Athena SQL statement: any valid SQL statement (exlude the s3:// prefix)

The column names specified in Dynamic Column Names must be named in this SQL statement eg: in the WHERE clause.

“Athena Query Result Location” is the S3 location where you want the query results saved.

“Athena Database” is the database to use in Athena.

Example:

Suppose you specify the following as the parent SQL statement:

select 'Invoices' as tblname
union
select 'InvoiceLines' as tblname

And further for the child SQL statement you specify:

select distinct InvoiceID
FROM tblname

What will happen is that the child statement will be transformed into two statements:

select distinct InvoiceID
FROM Invoices

and

select distinct InvoiceID
FROM InvoiceLines

What is happening here is “tblname” is being replaced in the child statement with the value from each row in the parent statement.

So with very little effort, depending on your use case, you could very easily and quickly build up dozens if not hundreds of statement to run.

This is the power of dynamic SQL!

Leave a Comment