Let’s try an alternative choice of partitioning columns. Profiling your data is so important!
If we were to partition by column f3, there would be 197 partitions:
select count(1) as numrows
from (
select f3, count(1) as numrows
from headersplitcompressed
group by f3
);

If we include the date column f6 as well, this swells to 16,291 partitions!

If we instead choose f3, year and month (derived from f6) we get 1667:

So first let’s create our new partitioned table:
CREATE TABLE headeraltpartitions
WITH ( format = 'Parquet', parquet_compression = 'SNAPPY', partitioned_by = ARRAY['F3','year','month'], external_location = 's3://rockstaretltesting1/sample_data/shipping/d0e9bd6148e8f14889980954017b0927/83bdd5c3225a70b48acff63e86c74193/ams/2020/202009291500/headeraltpartitions/' ) AS
SELECT F1,
F2,
F4,
F5,
F6,
F7,
F8,
F9,
F10,
F11,
F12,
F13,
F14,
F15,
F16,
F17,
F18,
F19,
F20,
F21,
F22,
F23,
F24,
F25,
F26,
F27,
F28,
F29,
F30,
F31,
F32,
F33,
F34,
F3,
substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)),
''), 1, 10), '%Y-%m-%d') AS varchar(10)),1,4) AS year ,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)), ''), 1, 10), '%Y-%m-%d') AS varchar(10)),6,2) AS month
FROM headersplitcompressed
WHERE F3 = 'AD'
and F6 is not null and F6 != ''
and substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)),
''), 1, 10), '%Y-%m-%d') AS varchar(10)),1,4) = '2020'
and substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)), ''), 1, 10), '%Y-%m-%d') AS varchar(10)),6,2) = '07';
Remember to clear out the S3 data before running the insert statement!
Now for the insert statement:
insert into shipping.headeraltpartitions (F1, F2, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, F28, F29, F30, F31, F32, F33, F34, F3,year, month)
SELECT F1,
F2,
F4,
F5,
F6,
F7,
F8,
F9,
F10,
F11,
F12,
F13,
F14,
F15,
F16,
F17,
F18,
F19,
F20,
F21,
F22,
F23,
F24,
F25,
F26,
F27,
F28,
F29,
F30,
F31,
F32,
F33,
F34,
F3,
substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)),
''), 1, 10), '%Y-%m-%d') AS varchar(10)),1,4) AS year ,substr(cast(date_parse(substr(NULLIF(cast(F6 AS varchar(10)), ''), 1, 10), '%Y-%m-%d') AS varchar(10)),6,2) AS month
FROM headersplitcompressed
And our partitioning columns (to be saved in it’s own file)
Notice the pipe (ie: | ) delimiting:
f3 as f3|
substr(cast(date_parse(substr(NULLIF(cast(f6 AS varchar(10)),
''), 1, 10), '%Y-%m-%d') AS varchar(10)),1,4) AS year |substr(cast(date_parse(substr(NULLIF(cast(f6 AS varchar(10)), ''), 1, 10), '%Y-%m-%d') AS varchar(10)),6,2) AS month
And then we need to upload these two files into S3:

Now we can create our rockstarETL job:


Now we run our job step:

And we get an error:

Ah, we left out a full stop in the suffix:

Let’s try that again:
And now it works!