Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Mar 4, 2024 6:28:06 AM
Aug 8, 2022 4:58:57 PM
The information in this article is provided as-is and to be used at own discretion. Depending on tool(s) used, customization(s), and/or other factors ongoing support on the solution below may not be provided by Qlik Support.
When larger tables participate in the Full Load replication phase in Qlik Replicate, you can accelerate the replication of these tables by splitting them into logical segments and loading these segments in parallel. Tables can be segmented by data ranges, by partitions, or by sub-partitions.
This article provides guidance in finding the adequate data segment boundaries based on actual data stored in those tables on the source.
As a first step, it is essentially to understand the size of the data tables.
Not every long table is big, nor not every short table is small. Table average uncompressed data size is determined by the formula below:
“table_size_MB” = “number of records in the table” * “avg. record length”
The table size may directly affect the unload duration. The longer is the unload, the bigger impact it will have on the following:
It is a good practice to unload the source table in “digestible” chunks that could be loaded optimally in less than 30 minutes. The amount of data in each segment can vary between tens of MB and few GBs and will heavily depend on the network quality and bandwidth, source database load at any given time, source database processing capability and other environmental aspects. The best way to gauge the overall Full Load performance is to test how long it takes to Full Load for a medium size table in your environment, and extrapolate from there.
Each segment should contain a reasonable number of records. Depending on the table width, the “reasonable” number of records in each batch may vary. As a rule of thumb, we would want each partition\segment load duration to be in a ballpark of 15-30 minutes.
This is important for the following reasons:
At this point we assume that we already know the approx. number of logical “partitions” (or segments) we would want to use for parallel unload. Perform the following to calculate the partition size:
When identifying the data ranges for SAP tables, MANDT field should appear in the subqueries WHERE clause.
ORACLE:
SELECT T1.GEN1, T1.THIS_ROW |
DB2i / DB2LUW / DB2z:
SELECT T1.GEN1, T1.THIS_ROW |
SQL SERVER:
SELECT T1.THIS_ROW, T1.GEN1 |
Once the query results are returned, tasks will need to be set up consistently in all the environments. Note that in non-Prod environments the data segments will be identical to the PROD one, hence if any testing needs to be done, values should be adjusted specifically to the environment the task is running in.
6. Click “Select Segment Columns” and pick the columns you are going to segment the table by and click OK. By default, the PK columns will be automatically selected
For SAP application(DB) endpoints the MANDT field is implicitly taken from the endpoint CLIENT field and will not appear in the column selection criteria.
7. Populate the segments with the ranges returned by the queries in ascending order and click “Validate”:
Validate” button only checks that all the input fields have been populated and that the data in those fields matches the field data type.
8. Correct any error that may be discovered and click “OK”.
Segments’ boundaries should be in an always-ascending order. Failure to comply may result in inconsistent data unload generating duplicate records and/or missing data.
Another use case where parallel load can be helpful is if you are getting ORA-01555 Snapshot too old errors, as it will use multiple queries of smaller data sets rather than one large "select * from" query.
Hi,
I don't really understand how the sql code works to return an ascending range of numbers. Can someone please explain? Also why shouldn't the partitions be evenly sized?
thanks,
Mohammed
Hello @MoeE
To get more direct assistance, please post your question directly to our Qlik Replicate forums. This will give your question more visibility across our customers and active support engineers.
All the best,
Sonja
Hello Team,
In oracle query I could see we have used DESC in order by function. Does it necessary to give? Because results of DESC and ASC totally vary. Could you please clarify on this?
Thanks,
Hello @dhina
Please post your query in the Qlik Replicate forum to gain access to your Qlik peers and our active Support Agents. Include all version information for your products when raising it.
All the best,
Sonja