July 2021

Multiple Partitioning columns with Athena Batch Partitioning Insert Job v2

Let’s try an alternative choice of partitioning columns. Profiling your data is so important! If we were to partition by column f3, there would be 197 partitions: select count(1) as numrowsfrom (select f3, count(1) as numrowsfrom headersplitcompressedgroup by f3); If we include the date column f6 as well, this swells to 16,291 partitions! If we …

Multiple Partitioning columns with Athena Batch Partitioning Insert Job v2 Read More »

Deleting S3 data

Say you want to change a table ie recreate it or re-populate the underlying data. Since Athena doesn’t provide the ability to delete rows, you’re options are to recreate the table using an Athena statement or stick with the original table and perform an S3 copy or delete operation: So let’s look at just deleting …

Deleting S3 data Read More »

Multiple Partitioning columns with Athena Batch Partitioning Insert Job

This time let’s create a partitioned table by multiple partitioning columns. We’ll use f3 and f6. For f6 which is a date string we’ll use a formula to get year, month and day: Let’s profile the data first: select F3, F6,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)),”), 1, 10), ‘%Y-%m-%d’) AS varchar(10)),1,4) AS year ,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)), ”), 1, …

Multiple Partitioning columns with Athena Batch Partitioning Insert Job Read More »

S3 and Athena example

So we have some sample data from AWS Data Exchange: https://console.aws.amazon.com/dataexchange/home?region=us-east-1#/products/prodview-stk4wn3mbhx24 And we’ve subscribed and downloaded it to our S3 bucket: s3://rockstaretltesting1/sample_data/shipping/d0e9bd6148e8f14889980954017b0927/83bdd5c3225a70b48acff63e86c74193/ams/2020/202009291500/ So the first thing we have to do is decide which file we’re going to use and copy it to it’s own folder. This is because we can’t create an Athena table pointing …

S3 and Athena example Read More »