AWS

Why rockstarETL?

Orchestration tools like Apache Airflow are great but are also more general purpose. You have the option of hosting yourself or using a managed service from GCP and AWS. However these tools require Python programming skills to use. rockstarETL only requires that you know SQL. There is no programming language required. rockstarETL is focused on …

Why rockstarETL? Read More »

Using the Athena Batch Partition Insert Job Type

So we’ve got our raw data in s3: And we have an Athena table over this data: The Athena query statement used to create this table: CREATE EXTERNAL TABLE invoices (invoiceid int,customerid int,billtocustomerid int,orderid int,deliverymethodid int,contactpersonid int,accountspersonid int,salespersonpersonid int,packedbypersonid int,invoicedate date,customerpurchaseordernumber string,iscreditnote int,totaldryitems int,totalchilleritems int,deliveryrun string,runposition string,confirmeddeliverytime timestamp,confirmedreceivedby string,lasteditedby int,lasteditedwhen timestamp)ROW FORMAT DELIMITEDFIELDS TERMINATED BY …

Using the Athena Batch Partition Insert Job Type Read More »

S3 Delete Objects

This job type delete objects from the bucket with the optional prefix you supply. This is useful for clearing a staging area (eg: after processing with other Athena job types) “Run Time” is the Time in UTC time zone when you want the job to run. “Number of Runs per day” is the number of times per …

S3 Delete Objects Read More »

S3 Copy Objects

This job type allows you to copy S3 objects from one bucket to another. This is useful for staging purposes, especially for running other Athena job types using this location. “Run Time” is the Time in UTC time zone when you want the job to run. “Number of Runs per day” is the number of times per …

S3 Copy Objects Read More »