Mike Houston

BigQuery row-level security fail

Recently I thought I’d look at BigQuery’s new row-level security feature: https://cloud.google.com/bigquery/docs/row-level-security-intro For a moment I was excited because I thought I could remove the view scenario I’m using to manage row-level access at the moment. Briefly, the fact table is joined to another table on the relevant key and each user’s email address is …

BigQuery row-level security fail Read More »

How to authorize BigQuery views across projects using the Java Client Library

Google Cloud recently changed their authorized views functionality in the console. And they broke it! You can no longer authorize views across projects using the console! It seems some of their documentation hasn’t been able to keep up either! https://cloud.google.com/bigquery/docs/authorized-views#console Fortunately some of their other documentation is up to date (but there’s a catch): https://cloud.google.com/bigquery/docs/share-access-views#authorize_the_view_to_access_the_source_dataset …

How to authorize BigQuery views across projects using the Java Client Library Read More »

Multiple Partitioning columns with Athena Batch Partitioning Insert Job v2

Let’s try an alternative choice of partitioning columns. Profiling your data is so important! If we were to partition by column f3, there would be 197 partitions: select count(1) as numrowsfrom (select f3, count(1) as numrowsfrom headersplitcompressedgroup by f3); If we include the date column f6 as well, this swells to 16,291 partitions! If we …

Multiple Partitioning columns with Athena Batch Partitioning Insert Job v2 Read More »

Deleting S3 data

Say you want to change a table ie recreate it or re-populate the underlying data. Since Athena doesn’t provide the ability to delete rows, you’re options are to recreate the table using an Athena statement or stick with the original table and perform an S3 copy or delete operation: So let’s look at just deleting …

Deleting S3 data Read More »

Multiple Partitioning columns with Athena Batch Partitioning Insert Job

This time let’s create a partitioned table by multiple partitioning columns. We’ll use f3 and f6. For f6 which is a date string we’ll use a formula to get year, month and day: Let’s profile the data first: select F3, F6,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)),”), 1, 10), ‘%Y-%m-%d’) AS varchar(10)),1,4) AS year ,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)), ”), 1, …

Multiple Partitioning columns with Athena Batch Partitioning Insert Job Read More »

S3 and Athena example

So we have some sample data from AWS Data Exchange: https://console.aws.amazon.com/dataexchange/home?region=us-east-1#/products/prodview-stk4wn3mbhx24 And we’ve subscribed and downloaded it to our S3 bucket: s3://rockstaretltesting1/sample_data/shipping/d0e9bd6148e8f14889980954017b0927/83bdd5c3225a70b48acff63e86c74193/ams/2020/202009291500/ So the first thing we have to do is decide which file we’re going to use and copy it to it’s own folder. This is because we can’t create an Athena table pointing …

S3 and Athena example Read More »