Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a very simple query in my load script that is pulling a lot of rows (300+ million) from a table in Oracle. It was working fine for a while (so there's nothing wrong with the query's syntax), but it started timing out recently. I've spoken to my DBA, and he said that I need to chunk my data so that I'm not going after all the rows at one time.
Here's the original query:
PartnerTerritoryOrderHistory:
LOAD
[MI Country KY],
[Partner ID],
[MI Product Key],
[Action Type],
[Status Type]
;
SQL
SELECT
hst.mi_country_ky AS "MI Country KY",
hst.partner_ky AS "Partner ID",
hst.mi_product_ky AS "MI Product Key",
hst.latest_ofr_action_type AS "Action Type",
hst.latest_ofr_status AS "Status Type"
FROM
rpt_dw.dcr_order_history_product hst
WHERE
hst.PARTNER_KY IN (183,13245,1733,1089,28209,25683,28449,27969,27611,26487,25565,25563,25005,23881,23761,27649,26847,26567,25885,24563,24443,3103,546,27651,27327,26767,26003,25883,25083,8337,23921,27249,25443,24843,24765,24763,24723,24403,24203,27489,26967,26647,26609,26607,26485,26045,25643,25363,25283,25123,1433,22803,21319,21081,20685,20519,12631,22125,21921,21483,21441,21159,16199,2111,16519,5257,2879,23321,21721,21521,1331,16001,19281,2899,22721,7579,21199,2705,6497,20041,6657,23521,2237,10657,22601,22561,21601,2695,20279,18959,22481,21641,20959,20319,3701,15441,14603,21279,21121,21039,15839,12227,2416,2839,6777,22321,21361,20799,20559,13801,20399,14919,20199,20159,18919,18439,17399,16999,16039,15999,13799,18319,17439,16479,16159,13679,18489,18279,17279,15279,14363,18079,14999,18359,17999,17519,17239,16439,14843,14361,14359,13681,13605,15759,15639,15439,4737,11027,13959,18533,13167,13247,12999,11819,12143,11541,11381,2709,2699,2691,10699,2823,8581,8819,9137,2552,10577,9819,8699,7777,3463,3977,3337,10021,2707,4057,7977,6737,8977,6857,3617,2829,2847,3079,7711,5459,2254,2236,4937,10937,7057,7337,7177,3741,1894,1893,11021,1714,1655,1614,4217,1151,368,277,8457,4897,3857,7539,648,3073,3067,9737,10457,192,9459,9423,2821);
STORE * FROM PartnerTerritoryOrderHistory INTO '..\qvds\PartnerTerritoryOrderHistory.qvd' (QVD);
DROP TABLES PartnerTerritoryOrderHistory;
In the end, I want the result to be the same (i.e., all the data written to a single QVD). What I want to do going forward is have the load script automatically query this table by every unique country key (mi_country_ky column). In other words, let's say I have 200 unique country keys under the mi_country_ky column. I would want the script to loop through and essentially perform 200 individual queries by unique country key (thus "chunking" my results in 200 separate buckets), but still write all the data from each of the 200 queries to the same QVD (PartnerTerritoryOrderHistory.qvd). I don't know how to modify my script to do this though. Can anyone provide the correct syntax for this desired "looping/concatenation" that I want?
Thanks,
Kenny
Hi Clever...I think I'm 99% there. I tweaked your code slightly for my situation, and it is cycling through each individual country key like I want. However, in the log for the QVW, the end looks like:
I don't understand why it's doing that last query with the blank country key (the second red rectangle) and why it has "127 lines fetched" twice. The country key in the top red rectangle (14077248) is the last country key in the cycle. Do you know why it's doing that? Thanks again!
Would you mind posting the Inline table and the For part?
Here's the code I'm using. I'm not loading the country key table as an inline table.
CountryKeys:
LOAD
[MI Country Key]
;
SQL
SELECT
DISTINCT mi_country_ky AS "MI Country Key"
FROM
rpt_dw.dcr_order_history_product
WHERE
mi_product_ky='1381605760' AND
partner_ky='11381'
;
For i = 0 to NoOfRows('CountryKeys');
LET vCountryKeys = Peek('MI Country Key',i,'CountryKeys');
PartnerTerritoryOrderHistory:
LOAD
[MI Country KY],
[Partner ID],
[MI Product Key],
[Action Type],
[Status Type]
;
SQL
SELECT
hst.mi_country_ky AS "MI Country KY",
hst.partner_ky AS "Partner ID",
hst.mi_product_ky AS "MI Product Key",
hst.latest_ofr_action_type AS "Action Type",
hst.latest_ofr_status AS "Status Type"
FROM
rpt_dw.dcr_order_history_product hst
WHERE
hst.mi_country_ky = '$(vCountryKeys)' AND
hst.PARTNER_KY IN (11381) AND
hst.mi_product_ky='1381605760';
Next
STORE * FROM PartnerTerritoryOrderHistory INTO '..\qvds\DataChunkingTest.qvd' (QVD);
DROP TABLES PartnerTerritoryOrderHistory, CountryKeys;
For i = 0 to NoOfRows('CountryKeys')
should be
For i = 0 to NoOfRows('CountryKeys') -1
Thanks for all your help. You went above and beyond. Really appreciate it.
Good to read is working now
Be Qlik, have fun
Clever...when I apply the code to my real and full set of data, it is way too slow. In the end, I need to load about 310+ million rows. In the past, when I would run my original simple query (before I started getting issues with it timing out), it would take about 2.5 hrs to pull back 300+ million rows. Now, with the FOR...NEXT looping, it's been running for 3+ hours and it has only pulled back ~20 million rows. Is there a way to optimize this significantly? If not, I think I'll have to look for another solution. The code works as I want it to...I just don't know why it's so slow.
Please ask your DBA Team if it´possible to create indexes for this columns
hst.mi_country_ky
hst.PARTNER_KY IN
hst.mi_product_ky
I´m guessing they are not indexed