What is the best way to load data into BigQuery?

rockstarETL (Rest API) vs DataFlow vs Client libraries?

DataFlow

I would not recommend it for batch loads. It’s just too slow.

DataFlow has it’s place if you want to stream.  Just bear in mind, streaming has a cost associated with it! And using rockstarETL is so performant, you have to seriously wonder if the additional cost is worth it!

Java Client Libraries

I really wanted this way to work but it was just too slow. Which is such a shame because I really like this method. I’ll link to my github to show you the code I used. 

I was actually quite perplexed as to why this method was so slow. I mean this is the cloud after all! You expect it to be performant! Then one day I was using the BigQuery command line (“bq load” command) to load up a whole lot of historical data. I couldn’t believe how fast and performant it was! It got me thinking. Why does this way work so well and other ways do not? It led me to the realisation that the client libraries and command line tools are built on top of the REST API. So naturally, if the higher level abstractions aren’t working for you, you have to go down one level. And that’s what I did with rockstarETL!

rockstarETL (Rest API)

rockstarETL uses the REST API to load data into BigQuery. It’s free and the fastest way to get data into BigQuery from Cloud Storage.

Leave a Comment