Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Subject: Understanding the Calculation of Estimated Finish Time in Qlik Replicate
Hello Qlik Community,
I am currently using Qlik Replicate and I have a question about the calculation of the Estimated Finish Time for a table. I am trying to understand the starting point or the reference from which this time is calculated. I have attached a snapshot of the table for your reference (highlighted the relevant section).
Could anyone please explain how the Estimated Finish Time is calculated in Qlik Replicate? Any guidance or resources would be greatly appreciated.
Thank you in advance.
Best Regards,
Deepak
Hello @deepaksahirwar ,
Thaks for reaching out Qlik Community
In Qlik Replicate, the estimated finish time for a replication task can be calculated based on various factors such as the size of the data being replicated, the network speed, the server resources, and any transformations or filters applied during the replication process.
However, Qlik Replicate itself doesn't provide a built-in feature to directly estimate the finish time of a replication task. You would typically need to monitor the progress of the replication job and real facts the remaining time based on the current rate of data transfer and any potential bottlenecks.
Information about tables that are being loaded
Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Regards,
Sachin B
Hello @deepaksahirwar ,
Thaks for reaching out Qlik Community
In Qlik Replicate, the estimated finish time for a replication task can be calculated based on various factors such as the size of the data being replicated, the network speed, the server resources, and any transformations or filters applied during the replication process.
However, Qlik Replicate itself doesn't provide a built-in feature to directly estimate the finish time of a replication task. You would typically need to monitor the progress of the replication job and real facts the remaining time based on the current rate of data transfer and any potential bottlenecks.
Information about tables that are being loaded
Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Regards,
Sachin B
@SachinB - you wrote "However, Qlik Replicate itself doesn't provide a built-in feature to directly estimate the finish time of a replication task. "
That is correct for the task as a whole, but it is not what the OP asked. @deepaksahirwar only asked about estimated TABLE load finish time. Replicate will directly predict that finish time based on what it believes to be the source row count and observed rows/second.
Now if the source DB (not indicated by OP) does not provide an estimate, perhaps because Statistics where never run on the source table, then Replicate cannot make that prediction. This appears to be the case for @deepaksahirwar as the indicated 'Estimated Rows' column in the provided screenshot show N/A for "Not Available".
The prediction may be wildly wrong (for better or for worse) when concurrent activities on source DB or network significantly change. The estimates may also be wrong when filtering (notably push down) significantly changes the actual number of rows from the source table size.
The Task finish time is much harder to predict as Replicate cannot look into the future as to how much work is yet to be done, other than know the remaining table count. It does not know how the remaining tables will behave (extra wide?). It can also be strongly influence by the number of threads the task designer made available. OFTEN, but not always there is a handful of tables which define the majority of the load time. Personally I like to give the XXL tables a higher load priority to make sure that towards the end all tables except for the biggest one is down (at which point the estimate will often be there and valid).
By starting extra big tables first you reduce the risk of waiting for the "ZZtop" large table, and you'll find that once streams are 'done' with the slow tables they quickly tackle the smaller ones distributing them over more and more streams as the large ones are completed. Compare this with the "Big Stones First" inspirational talks/papers 🙂
Hein.
Hi @Heinvandenheuvel and @SachinB ,
So basically during full load, if the 'Estimate Finish Time' value shows as '-' null that means it can't be configured from qlik replicate replicate internally?
I was thinking that the ETA value must be configured by using some functions internally.
Thank you,
Deepak
@deepaksahirwar >> I was thinking that the ETA value must be configured by using some functions internally.
HOW??? Please read my reply again, carefully. The function is to ask the source how many rows to expects. Next measure how many rows were processed and how much time has elapsed, since start of the table from there, extrapolate to a finish time.
If the source fails to provide an estiamated count, becase the source in question - which you failed to indicate - does not have that function at all, or was not given a chance to get that number then there is nothing Repicate can do, You on the other hand can do your homework, test and measure in DEV, re-test and measure in QA and get your own anticipated duration list. Or if the task is already in PROD and just use a prior reptask_xxx.log and calculate the previous elapsed time and row count by working with the lines:
- Start loading table 'dbo'.'tbl' (Id = 29) by subtask 1.
- Load finished for table 'dbo'.'tbl' (Id = 29). 140 rows received. 0 rows skipped.
- Table 'dbo'.'capvoucher' (Id = 142) Loading finished by subtask 1.
Now this can become tedious, specially if the load task spans multiple logs and/or the task was interupted and re-started. Attached a PERL script to go over a wildcard list of reptask logs and reports on the tables loaded by times of start, duration, rows, stream and gives and ascii-terminal timeline visual to help you 'see' what went on. This in turn allows you to optimize the priories and allowed max stream count. Too many streams may hurt individual table load times. Both too many and too few streams will increase total task time.
Sample output below. What you can see here that once stream 15 freed up, it alone processed 6 tables while others where still ongoing
Hein,
Start DateTime Elapsed # Rows Rows/sec. Schema.TableName < TimeLine - One mark is 00h04m50s >
:
:
04/22 09:30:28 00:11:48 3 841737 1188 FINLIVE62.SALES_ORDER_INVOICES ** |
04/22 09:30:29 01:22:31 6 7463673 1507 FINLIVE62.SALES_ORDER_INVOICE_ ***************** |
04/22 09:30:36 01:58:04 7 9381436 1324 FINLIVE62.SALES_ORDER_LINES ************************ |
04/22 09:31:47 00:38:53 5 9381436 4021 FINLIVE62.SALES_ORDER_LINE_DET ******** |
04/22 09:32:12 00:41:45 8 9402358 3753 FINLIVE62.SALES_ORDER_LINE_DET ******** |
04/22 09:33:33 00:34:48 9 9402466 4503 FINLIVE62.SALES_ORDER_LINE_QTY ******* |
04/22 09:34:11 00:03:56 2 1659616 7032 FINLIVE62.SALES_PRICES .* |
04/22 09:37:39 00:05:03 13 1663007 5488 FINLIVE62.SALES_PRICE_LINES .* |
04/22 09:38:07 01:41:10 2 12631279 2080 FINLIVE62.SA_DETAIL_LINES .******************** |
04/22 09:38:18 00:00:02 15 1705 852 FINLIVE62.SF_BRANDS .* |
04/22 09:38:20 00:00:02 15 97 48 FINLIVE62.SYS_CUSTOMER_CLASSES .* |
04/22 09:38:22 00:00:02 15 122 61 FINLIVE62.SYS_STATES .* |
04/22 09:38:24 00:01:54 15 156696 1374 FINLIVE62.VENDORS .* |
04/22 09:40:19 00:00:02 15 495 247 FINLIVE62.WAREHOUSE_CONTROLS ..* |
04/22 09:40:22 00:00:02 15 33 16 MANLIVE62.FACTORIES ..* |
04/22 09:40:24 00:09:49 15 1191588 2023 MANLIVE62.JOBS ..** |
04/22 09:42:17 02:00:14 3 25508371 3535 MANLIVE62.JOB_DETAIL_COSTS ..************************ |
04/22 09:42:43 00:18:05 13 2302618 2122 MANLIVE62.JOB_OUTPUTS ..*** |
:
:
Total 128882855 rows in 59 tables. 04h50m15s Elapsed. 7400 Records/second overall, 0 rows skipped.