Recently I thought I’d look at BigQuery’s new row-level security feature: https://cloud.google.com/bigquery/docs/row-level-security-intro
For a moment I was excited because I thought I could remove the view scenario I’m using to manage row-level access at the moment.
Briefly, the fact table is joined to another table on the relevant key and each user’s email address is recorded next to the key’s they’re allowed to see.
So you have a table where you store the user’s email account and specify the key eg: LocationID which they’re allowed to see:
In my case, this is edited in Google Sheets and then I use some Java: https://github.com/gouston/GCP_snippets/blob/master/appengine-bigquery/src/main/java/access_control/ACL.java and Google App Engine to get that into a BigQuery table. I then join this table to the fact table on eg: LocationID inside of a view. This then is an authorized view which is then used as the source for DataStudio datasets.
Anyway, as I was saying, there was some cause for optimism as I began re-thinking of the idea of a monolithic denormalized fact table. Because the performance is definitely better, but you have to write some pretty nightmarish SQL to update the nested data. Eg: if you have invoice header details and then nest the invoice lines: then if you want to update the invoices lines, the SQL get’s extremely hairy. I played with this for quite a while because the DataStudio reports definitely performed noticeably better. However, one has to think of the solution holistically. Ie: even though the report performs better this is not a goof enough reason to implement if the maintenance is ridiculous! I really wanted this to work but I couldn’t get it to the point where I was happy to productionize it. As I recall there was also always an issue with loading the table in terms of completeness of data. Ie: there was always missing data after a load. The most bullet proof solution has always been a Kimball stype dimensional model. Ol’ Ralph and his team certainly knew what they were talking about. So much so that one day after having done it all and written about it extensively they simply retired and dropped the mic. Classic.
But it didn’t take long to realise that this new feature is really just a pipe dream which really comes down to one huge glaring limitation:
no support for partition pruning
Fortunately I didn’t get my hopes up for too long before deciding to read the limitations early on and see this nugget.
Maybe there’s a little bit of experience coming through in me prompting myself to inspect the limitations before going too far.
But, alas partition pruning, from both a performance and cost perspective is an absolute must.
So, once again my view solution still stands!