Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have run into an issue with the amount of time it takes to pull data in. I'm querying an oracle db server, using a simple select statement:
select
-- The 10 columns I need for my query
from
table_name
Most of the columns are number fields with a couple nvarchar(20) fields, so not much wide data, and only 10 columns. The problem is there are 144 million rows in this table. It takes 14.5 hours to run, and my tasks aren't allowed to take over 30 minutes to run on our QlikView server. This table is not a slowly changing dimension table, so I must do a full load each run, no incremental loading. One suggestion I was given was to run 18 different queries at the same time using a mod on the primary key:
select
-- The 10 columns I need for my query
from
table_name
where
mod(primarykey,18) = 0
and
select
-- The 10 columns I need for my query
from
table_name
where
mod(primarykey,18) = 1
etc up to 17. The DB server can supposedly handle running these 18 queries against it at the same time. The problem I'm seeing from this approach, is I'll need 18 different qvw's on my QlikView server to run these queries at the same time. Is there a way in 1 qvw document I can run all 18 queries at once, and have it load into a qvd file when it is done? Does anyone have any other creative suggestions on how I might be able to handle this? I know this is a strange one, and I surely don't want to go the 18 qvw's route to solve this, seems like way too much of a hack job.
Thank you for your help!
Can you figure out why it takes 14.5 hours to extract 144 million rows from an Oracle database? It could be that the cause of this excessive delay cannot be eliminated by just parallelising the extraction.
You can check the supposed improvement using 3 qvw's that extract 24 million rows.
Are you really sure that no incremental loading is possible?
- Marcus
I'm fairly certain, I'm going to double check though.
So you think there is a good chance that if I ran 2 queries at the same time, it wouldn't half the amount of time it takes? From what the dba's said, there is plenty of unused processor and memory on the server.
CPU and RAM are only part of the equation. Network bandwidth and disk performance matter as well. If the network is the bottleneck, it won't matter how many queries you run, your performance won't improve at all. So, it's worth doing some testing.
As for how to do it, it's not that tough. Start with one QVW, and create a variable for the number you're modding by, with an input box. Take the script and put it in an include file and use $(Must_Include= ) to include it.
Then, just create 17 copies of the QVW, open up each one and change the variable. Alternatively, you could name each file something like Query_1.qvw and use the DocName() function and Subfield to pull the number out of the filename instead of using a variable.
The reason for using the include file for the script is to make changes easy.
Well, if you really need to load all of those records with no incremental load I would create 18 qvw´s each one running a part of your SQL (with that mod() function) and storing a different QVD.
Then I would schedule that 18 qvw´s to run at same time and hope your RDBMS could handle the traffic
Please be aware that something like
mod(primarykey,18) = 1
can eliminate any index usage by RDBMS. Functions must reside at "right' side of expressions to allow database servers to make usage of indexes