Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all -
I wondering if i could collate some recommendation for a full load. I appear to be getting a disconnect from the DB issue, that restarts the full load each time. I haven't set the logs to verbose just yet to see whats happening. But what I do know the tables loading are in the 10s of billions for records size.
I am wondering if given that records count there might be something obvious I may need to set to stop this happening. It should be noted its taking about 5 hours per billion records.
thanks.
Hello @sreaney89 ,
Thanks for reaching out to Qlik Community!
There are several issues with the task that need to be addressed:
Improper Task Settings:
Please disable Apply Changes Processing and keep only Store Changes Processing enabled. This may resolve some configuration-related errors.
Error: "WAL reader terminated with broken connection / recoverable error. WAL stream loop ended abnormally":
This error is causing the task to stop and attempt auto-recovery. The root cause is likely network-related—potential issues include an unstable connection, connection timeout, firewall rules closing inactive connections, server settings, or resource constraints on the server. To mitigate this, please enable the WAL heartbeat on the PostgreSQL source endpoint to check if it improves stability.
These errors are negatively affecting the full load performance and may lead to the full load stopping and restarting during recovery.
Current load time: 5 hours per billion records.
To improve performance:
Hope this helps.
John.
Just another quick question - if I start and stop the full load task does it restart from the beginning and attempt to load the entire table? Something I've noticed as well is that some of the estimates its generating are way off.. in the millions rather than billions of records.
Hi @sreaney89 ,
If you are stopping a task where full load is happening then Stop and Start will reinitiate a fresh reload .
Regards
Arun
Hi @sreaney89
To add to @aarun_arasu 's post, we run a simple "select *" query against the source rather than a "select * order by" for performance reasons. Due to this, there is no way to track where a full load left off in order to resume. Also, this could leave out records if some were inserted after the task stopped and before it was resumed.
I hope this helps!
Dana
Hi @sreaney89 ,
In the past, we had a feature that allowed us to resume loading from a specific record. However, we found this feature to be impractical.
To resume from a processed record, we need to query the records in order, such as by primary key or unique index. If a table contains many records, using "ORDER BY" can place a significant load on the system due to the need to sort the records. Therefore, as @Dana_Baldwin mentioned, we avoid using "ORDER BY" for performance reasons.
Regards,
Desmond