Why rockstarETL?

Orchestration tools like Apache Airflow are great but are also more general purpose. You have the option of hosting yourself or using a managed service from GCP and AWS. However these tools require Python programming skills to use.

rockstarETL only requires that you know SQL. There is no programming language required.

rockstarETL is focused on managing your data (blob) objects in cloud storage (S3 for AWS and Google Cloud Storage for GCP). This is done via copy and delete jobs which are very robust. You’ll find them well suited to your data lake management requirements.

On AWS, rockstarETL is laser focussed on Amazon Athena. For Example the 100 partition batch insert limit is solved by using a Batch Partition Insert Job. Naturally dynamic SQL queries are offered!

On GCP, rockstarETL is laser-focused on BigQuery queries (especially dynamic SQL) and loading of data from GCS into BigQuery.

Tools like DBT and DataForm are great and have some good features. However they force you into their way of doing things. They’re quite opinionated about how things should be done. And this is fine if you agree with their philosophy.

However, what if you don’t agree?

What if you have your own ideas on how you want to model your data and build your data warehouse?

Also, while in addition to their SAAS offering (where I have my concerns) they do offer open source versions of their engines. You’re back to square one in setting up and managing infrastructure. And these don’t solve the problem of scheduling. That you have to figure out for yourself. And yes while they do have good communities, it’s still an investment of time and energy to sift through all the information.

rockstarETL solves all of these problems by being a VM you manage and control, so no SAAS concerns over security, without the overhead of figuring out how to host and run the tool. Scheduling is also solved.

So you’re left with software that let’s you start testing and building quickly from the outset.