Suggest an Idea

DWH2Go · ‎2020-10-26

We worked with Qlik and Snowflake to determine why full load and CDC Replicate performance seemed to be slower than expected (client portal case 1966818). The issue in brief is that Snowflake resources are given single files to load with the COPY command, forcing it to work in a single-threaded manner. Performance would be greatly improved by partitioning files and using COPY <list of files to load>.

Full load - for each table, a single large file is created, and once that is complete, a single COPY command is run. Multiple different tables can be run in parallel this way. But there is an opportunity here to break the very large single table files up (which take the longest to load, gating the job) into smaller files, and then issue parallel COPY commands as those complete (reassembling them in proper sequence afterwards). CDC is actually more important than full load, so read on.

CDC load - Even after adjusting tuning parameters for size of files to drop during CDC, we still see performance can be improved, as each file is loaded sequentially. Instead, partition the file into arbitrary multiple files (add a pseudo-column representing file seq # if necessary) , and use COPY <list of files> to load the data in parallel, which will merge the data.

Ola_Mayer · ‎2020-11-02

spachunuri · ‎2020-12-28

This feature would be good.

mjUSAA1 · ‎2023-05-26

Does anyone know of any traction on this feature request?

Meghann_MacDonald · ‎2023-08-02

From now on, please track this idea from the Ideation portal.

Link to new idea

Meghann

NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.

Ideation · ‎2023-08-02

Suggest an Idea

Replicate to Snowflake Performance Improvement Request

Replicate to Snowflake Performance Improvement Request