Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm wondering about a certain scenario.
Say for example, there is a large table that needs to go to an ADLS target using parallel load, and there are 5 parallel segments.
Is there a possibility of say segment 4 loading before segment 3? Thus resulting in incorrect sequence in ADLS.
Also, are there any best practices regarding loading very large static tables to ADLS? My only current idea is parallel load. Thank you
Regards,
Mohammed
Mohammed
Hello Mohammed, @MoeyE ,
There is no way to guarantee the partition initial load order at present. For example the partitionID and rows number in them as below:
P1: 1
P2: 1
P3: 1000000
P4: 10
P5...
In the scenario (let's say 1 row takes 1 second to load to target, just for brainstorm :)) then we see P4 data will get into target side before P3 data rows, even the table has PK.
So far I'd like to suggest loading each partition to a separate file by multiple tasks, and consume the files by file order after all partitions load done.
Regards,
John.
Hello Mohammed, @MoeyE ,
Thanks for reaching to Qlik Community!
I guess I did not get the exact concern of "resulting in incorrect sequence in ADLS". Any explanation is welcome.
Thanks,
John.
Hello @MoeyE ,
I'm guessing it's the order of the data rows say order by PK or given column(s). For example you want the data records in ADLS file(s) are in PK ascending or descending order.
As the partitions are triggered to load data without defined priority, it's possible any partition startup prior to another. If you want to control the order, I'd like to propose:
1- Define VIEWs for each partition in source database then put these views with load priority (See below screen copy)
2- Using multiple tasks which you may control the initial load data range in manual method
however please take note that in above approach the data records spreads to multiple ADLS Files, and in each file the data records are in order.
Hope this helps.
John.
Hi John,
Thank you. Yes the goal is so that all data is loaded to the target in the same order as it exists on the source. So i'm just confirming. It is possible for partition 4's data to appear in the target before the target 3 data even if there is a primary key? thanks
Regards,
Mohammed
Hello Mohammed, @MoeyE ,
There is no way to guarantee the partition initial load order at present. For example the partitionID and rows number in them as below:
P1: 1
P2: 1
P3: 1000000
P4: 10
P5...
In the scenario (let's say 1 row takes 1 second to load to target, just for brainstorm :)) then we see P4 data will get into target side before P3 data rows, even the table has PK.
So far I'd like to suggest loading each partition to a separate file by multiple tasks, and consume the files by file order after all partitions load done.
Regards,
John.
Hi John,
Thanks for the explanations. I appreciate it.
Regards,
Mohammed
Glad to hear that Mohammed @MoeyE ! please marked the comment as "Accept as Solution" if it worked for you.
Thanks for your great support,
John.