Self Join - QVD and Oracle DB - Qlik Community

Report Inappropriate Content · ‎2009-12-22

Hi,

I need to connect to one QVD file and one Oracle Database and have a join on that.

Both the table have a huge number of records (Oracle database is lot more then the qvd).

I need to pull data from the Oracle Database for only matching values to the one in the QVD file.

When I try to do this QlikView is getting stuck.

Is this the right approach or anyone can suggest a better approach.

I just checked one post which mentioned the use of where exists, would this be a better way ?

Creating a QVD for the entire dataset from the oracle would not make sense and only a small part of it would be required for reporting.


Table1:
Load
A,
B
From xyz.qvd; ((QVD file)

inner join

connection string
sql select
A,
B,
C,
D,
E,
F
From tablename;

//Thanks in advance to all you guys.....

Anonymous · ‎2009-12-22

What "getting stuck" means here? Maybe it is loading from Oracle, and it takes long time? Anything useful you can see in the log file?
Anyway, I'd do it a little bit differently. Maybe nothing is wrong here, but the location of the connection string bothers me. Besides, your requirement description implies that you need left join, not inner join. Also, I'd use preceding load.


connection string
Table1:
Load
A,
B
From xyz.qvd;
left join (Table1) load * ; (or join (Table1) load * where exists(A) and exists(B);)
sql select
A,
B,
C,
D,
E,
F
From tablename;

johnw · ‎2009-12-22


Michael Solomovich wrote:or join (Table1) load * where exists(A) and exists(B);

I have a theory that the exists() won't speed anything up or use less memory during the load, and could actually slow things down. My reasoning isn't very solid, mostly just a feeling that QlikView could be at least as efficient with a good implementation of a left join as it would first doing the exists and then a left join.

I just did a little bit of testing with an autogenerated table with two key fields with 2000 values each for 4 million rows. I then left joined another autogenerated table with the same two key fields with 4000 values each for 16 million rows, and including a data field. With just the left join, it took 40 seconds. With the exists for both key fields added in a preceeding load, it took 168 seconds, so about four times as long. Memory usage during the load wasn't really high enough to compare, but there DID appear to be a slightly higher blip near the end on the second approach when I was watching memory usage in task manager.

Now, my test data was very much junk data - just integers from 1 to 2000 or 4000 for keys, and a random integer between 1 and 100 for the data field. So I don't know how applicable this is to real world data sets. But it is at least suggestive that perhaps my intuition is on target here. I would at least try it with and without the exists and see which behaves better for your data set. My guess is that it will behave better without the exists.

(Edit: Adding the preceeding load could have slowed it down a lot more than any difference between exists() and not using exists(). So I moved the exists into the original load, and it ended up taking 139 seconds. Better, but still much slower than just skipping the exists() entirely.)