Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have an Oracle database of a lot of data to pull data per month (approx. 2.5 million records per month) to pull from Oracle to individual monthly files (six years’ worth). I have two ways I was testing the pull of the Oracle data to QVDs: 1) I pulled all the data with one SQL Select statement with 6 sub-queries (different selection criteria in each sub-query) 55 rows of data per month filter; 2) I pulled the main data (48 rows) separately with one SQL Select statement filter by a given month, then created separate single pull (7 additional rows) from the Oracle table and joined back to the resident table prior to saving the QVD.
The single pull (1) from the Oracle table took 4.5 hours, whereas the multiple pulls (2) with joins took between 20 min. and 25 min. What is going on behind scenes/optimization that makes the single pull of data with the sub-queries take so long? I noticed a comment in the community suggests pulling large about of data through SQL-Plus to text was very fast, then they pull the text file back in to QlikView. I ran the SQL Select with the sub-queries in SQL Developer and it took 3 min. to run and pull the data.
I have an Oracle database of a lot of data to pull data per month (approx. 2.5 million records per month) to pull from Oracle to individual monthly files (six years’ worth).
I have two ways I was testing the pull of the Oracle data to QVDs: 1) I pulled all the data with one SQL Select statement with 6 sub-queries (different selection criteria in each sub-query) 55 rows of data per month filter;
2) I pulled the main data (48 rows) separately with one SQL Select statement filter by a given month, then created separate single pull (7 additional rows) from the Oracle table and joined back to the resident table prior to saving the QVD.
The single pull (1) from the Oracle table took 4.5 hours, whereas the multiple pulls (2) with joins took between 20 min. and 25 min. What is going on behind scenes/optimization that makes the single pull of data with the sub-queries take so long? I noticed a comment in the community suggests pulling large about of data through SQL-Plus to text was very fast, then they pull the text file back in to QlikView. I ran the SQL Select with the sub-queries in SQL Developer and it took 3 min. to run and pull the data.
Peter,
Yes there are 2.5 million records per month of data. I keep in separate QVD 72 in all allowing the reporting application to pick and choose which months are needing for reporting on. As or Rows, I should have said Columns of data thinking in terms of a database
What I wanted to do in lieu of hitting the database several times is make one pass on the database, thus in scenario 2 I took the sub queries from the separate data pulls used in scenario 1 and added them to the main pull making one pass to the Oracle Database.
I pass the parameters for the vRepCycle and vOutputFile to the function in a Do while building 72 separate QVDs, one for each month
Unfortunately scenario 2 takes way too long to execute
Scenario 1
LET vQVDPath = '..\Internal\DataSources\CDB\';
LET vRepCycle = 20150701;
Let vFileName = Peek('FileName',$(i),'LOAN_INFO_QVD_LIST');
DATA_DYNAMIC:
LOAD
LDI_KEY,
RepCycle,
REPORTED_ACTION_CODE,
Field1,
Field2,
Field3,
......,
Field48
;
SQL SELECT
LDI.LDI_KEY,
LDI.REP_CYCLE,
LDI_REPORTED_ACTION_CODE,
LDI.Field1,
LDI.Field2,
LDI.Field3,
LDI. ... ,
LDI.Field48,
FROM DATA_DYNAMIC LDI
JOIN DATA_STATIC on DATA_STATIC.INTERNAL_ID = LDI.INTERNAL_ID
JOIN POOL_DATA POI on POI.POOL_NBR = LDI.POOL_NBR
JOIN SERIES_DATA SRI on SRI.SERIES_NBR = POI.SERIES_NBR
WHERE LDI.REP_CYCLE = $(vRepCycle)
;
Join (DATA_DYNAMIC)
LOAD
LDI_KEY,
Field49,
Field50
;
SQL SELECT
LDI_KEY,
RepCycle,
(SELECT BK_CODE FROM (SELECT internal_id as ilid, BK_FILED_DATE, BK_EXIT_DATE, BK_CODE FROM BK order by BK_FILED_DATE DESC)
WHERE ilid=LDI.INTERNAL_ID AND TRUNC(BK_FILED_DATE,'MM') <= LDI.CYCLE_DATE AND (TRUNC(BK_EXIT_DATE,'MM') >= LDI.CYCLE_DATE or BK_EXIT_DATE IS NULL) and ROWNUM=1) as “Field49”,
(SELECT BK_FILED_DATE FROM (SELECT internal_id as ilid, BK_FILED_DATE, BK_EXIT_DATE, BK_CHAPTER_CODE FROM BK order by BK_FILED_DATE DESC)
WHERE ilid=LDI.INTERNAL_ID AND TRUNC(BK_FILED_DATE,'MM') <= LDI.CYCLE_DATE AND (TRUNC(BK_EXIT_DATE,'MM') >= LDI.CYCLE_DATE or BK_EXIT_DATE IS NULL) and ROWNUM=1) as “Field50”
FROM DATA_DYNAMIC LDI
WHERE LDI.REPORTED_ACTION_CODE='51' AND LDI.REP_CYCLE = $(vRepCycle)
;
JOIN (DATA_DYNAMIC)
LOAD
LDI_KEY,
Field51
;
SQL SELECT
LDI.LDI_KEY,
LDI.CYCLE_DATE,
(SELECT trunc(ATT_REFERRAL_DATE) FROM (SELECT internal_id as ilid, ATT_REFERRAL_DATE, FRCLSR_EXIT_DATE FROM FORE order by ATT_REFERRAL_DATE DESC)
WHERE ilid=ldi.INTERNAL_ID AND TRUNC(ATT_REFERRAL_DATE,'MM') <= ldi.CYCLE_DATE AND (TRUNC(FRC_EXIT_DATE,'MM') >= ldi.CYCLE_DATE or FRC_EXIT_DATE IS NULL) and ROWNUM=1) as "Field51"
FROM DATA_DYNAMIC LDI
WHERE LDI.REPORTED_ACTION_CODE in ('3','17','27') AND LDI.REP_CYCLE = $(vRepCycle)
;
Join (DATA_DYNAMIC)
LOAD LDI_KEY,
Field52
;
SQL SELECT
LDI.LDI_KEY,
LDI.CYCLE_DATE,
(SELECT trunc(FRC_SALE_DATE) FROM (SELECT internal_id as ilid, ATT_REFERRAL_DATE, FRC_SALE_DATE FROM FORE order by FRC_SALE_DATE DESC)
WHERE ilid=ldi.INTERNAL_ID AND TRUNC(FRC_SALE_DATE,'MM') <= ldi.CYCLE_DATE AND FRC_SALE_DATE is not null and ROWNUM=1) as "Field52"
FROM DATA_DYNAMIC LDI
WHERE LDI.REPORTED_ACTION_CODE='7' AND LDI.REP_CYCLE = $(vRepCycle)
;
STORE DATA_DYNAMIC into $(vQVDPath)$(vOutputFile);
DROP Table DATA_DYNAMIC;
Scenario 2
LET vQVDPath = '..\Internal\DataSources\CDB\';
LET vRepCycle = 20150701;
Let vFileName = Peek('FileName',$(i),'LOAN_INFO_QVD_LIST');
DATA_DYNAMIC:
LOAD
LDI_KEY,
RepCycle,
REPORTED_ACTION_CODE,
Field1,
Field2,
Field3,
......,
Field48
;
SQL SELECT
LDI.LDI_KEY,
LDI.REP_CYCLE,
LDI_REPORTED_ACTION_CODE,
LDI.Field1,
LDI.Field2,
LDI.Field3,
LDI. ,
LDI.Field48,
(SELECT BK_CODE FROM (SELECT internal_id as ilid, BK_FILED_DATE, BK_EXIT_DATE, BK_CODE FROM BK order by BK_FILED_DATE DESC)
WHERE LDI.REPORTED_ACTION_CODE=51 AND ilid=LDI.INTERNAL_ID AND TRUNC(BK_FILED_DATE,'MM') <= LDI.CYCLE_DATE AND (TRUNC(BK_EXIT_DATE,'MM') >= LDI.CYCLE_DATE or BK_EXIT_DATE IS NULL) and ROWNUM=1) as “Field49”,
(SELECT BK_FILED_DATE FROM (SELECT internal_id as ilid, BK_FILED_DATE, BK_EXIT_DATE, BK_CHAPTER_CODE FROM BK order by BK_FILED_DATE DESC)
WHERE LDI.REPORTED_ACTION_CODE=51 AND ilid=LDI.INTERNAL_ID AND TRUNC(BK_FILED_DATE,'MM') <= LDI.CYCLE_DATE AND (TRUNC(BK_EXIT_DATE,'MM') >= LDI.CYCLE_DATE or BK_EXIT_DATE IS NULL) and ROWNUM=1) as “Field50”,
(SELECT trunc(ATT_REFERRAL_DATE) FROM (SELECT internal_id as ilid, ATT_REFERRAL_DATE, FRC_EXIT_DATE FROM FORE order by ATT_REFERRAL_DATE DESC)
WHERE LDI.REPORTED_ACTION_CODE in ('3','17','27')AND ilid=ldi.INTERNAL_ID AND TRUNC(ATT_REFERRAL_DATE,'MM') <= ldi.CYCLE_DATE AND (TRUNC(FRCLSR_EXIT_DATE,'MM') >= ldi.CYCLE_DATE or FRC_EXIT_DATE IS NULL) and ROWNUM=1) as "Field51",
(SELECT trunc(FRC_SALE_DATE) FROM (SELECT internal_id as ilid, ATT_REFERRAL_DATE, FRC_SALE_DATE FROM FORE order by FRC_SALE_DATE DESC)
WHERE LDI.REPORTED_ACTION_CODE=7 ilid=ldi.INTERNAL_ID AND TRUNC(FRC_SALE_DATE,'MM') <= ldi.CYCLE_DATE AND FRC_SALE_DATE is not null and ROWNUM=1) as "Field52"
FROM DATA_DYNAMIC LDI
JOIN DATA_STATIC on DATA_STATIC.INTERNAL_ID = LDI.INTERNAL_ID
JOIN POOL_DATA POI on POI.POOL_NBR = LDI.POOL_NBR
JOIN SERIES_DATA SRI on SRI.SERIES_NBR = POI.SERIES_NBR
WHERE LDI.REP_CYCLE = $(vRepCycle)
;
STORE DATA_DYNAMIC into $(vQVDPath)$(vOutputFile);
DROP Table DATA_DYNAMIC;
Thank you for taking a look at it and your response
Bernie Sellers
Analytical Consultant
(p)410.884.2182
The 4.5 hour run-time is that for all the six years or just for a single month?
Single Month
Bernie Sellers
Analytical Consultant
My best advice is to run this very efficient query (for Oracle) on the Oracle server and store the result set in an Oracle table and then read this single table from QlikView and see what kind of performance you then get.
I would do these test
1) in sqlplus run the single query (1 month, If I understood the 4.5 hour query) and wait until you get all records
Why? Because (in sql developer, toad, etc, .....) you get the first rows from the db after some seconds but you should measure the time for all the rows
This is your WAY 1
2) in sqlplus run the 6-7 query (WAY 2), the join doesn't matter
Compare the time of 1) and 2) with the time you get in Qlik (4.5 hours, 20-25 minutes).
I think you get the same time or just a little better in SqlPlus vs Qlik (in 2, sqlplus doesn't join)
If you get the same time, the way to get better is ask your Oracle dba to optimize the query (it seems there are some order by you can skip).
If not, let we know.
P.S. its not so strange that when the query becomes complex (single query), the Oracle optimizer doesn't choose the best execution plan and the exec time increases
Hi Check this thread. Hope this will help you.
*** 6 Weeks in to QV Development, 30 Million Records QV Document and Help Needed!!! ****