We develop software to enable users to build and manage data warehouses and data lakes on Google Cloud Platform and AWS.
We started developing in the cloud in 2018 and quickly noticed the lack of tools. For the tooling available, we were shocked at the cost and nervous of getting locked into third party offerings. We needed to build a data warehouse using Google Cloud BigQuery. Our source data was uploaded to Google Cloud Storage buckets and we needed to get it into BigQuery and build a star schema Kimball style dimensional model. (We did it by the book and haven’t looked back a single day!)
While it was easy to run queries in BigQuery using the console and programmatically using the SDK there was no way to stich these together to build datawarehouse pipelines. (ie Those actually workable in the real world to build a real data warehouse) So we ended up “rolling our own.”
After much learning and a few years in production, we wanted to make our product available to others and so began an epic development effort. We effectively re-wrote the entire code base and added a nifty front end.
We soon realized our software needed to run in VMs and adapted our code accordingly. It was then that we became big fans of VMs over SAAS offerings. (Our software runs in EC2 instances on AWS. In our opinion, VMs are more secure than SAAS offerings and much more configurable!)
After a workable product was made for Google Cloud we cast our eyes further a field at AWS. After working with S3 and Athena we realized we could adapt the Google Product to AWS. A second epic development effort ensued (although we already had our nifty front end!) In the process, we managed to solve some sticky Athena limitations like (the 100 partition query limit) We were also able to transfer things we’d learnt with our AWS product back into our Google product!