One of the key pieces in our performance story is the transfer rate between on prem to azure.
Currently the data flow goes:
Netezza db (data is highly compressed)
QLIK replicate server load files (data is not compressed)
Network pipe (on-prem to Azure datacentre)
ADLS (data is not compressed)
Synapse tables (data is compressed)
Has your engineering team considered gzipping the QLIK replicate server load files ?
So that the data flow would be :
Netezza db (data is highly compressed)
QLIK replicate server load files (gzipped)
Network pipe (on-prem to Azure datacentre)
ADLS (gzipped)
Synapse tables (data is compressed).
Based on my reading, the Polybase engine supports loading .gz files directly into Synapse.
I realize this may affect the parallelization of the run, but our experience is that the Synapse load process takes 20-35 seconds per Gb file, and the network move takes 100x that, so it might be a compromise worth making, that would lead to a significant throughput boost.
Please ask your team if they have explored this idea,
It would be super easy to implement / prototype to see if yields performance improvements.