Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Performance issue with 'where exists' clause

Hi,

I have an issue with the load of really large table. It deals with user to customer relationships and is the preparation for the section access.

First of all, I load a smaller table (about 1000 rows) with users who shall be able to see everything.

Then I load a table that has about 1 billion rows with a "where not exists (user_id)" clause,

Table A:

LOAD

     USER_ID,

     VIEW_ALL_FLAG

FROM

...

;

Table B:

LOAD

     USER_ID,

     CUSTOMER_ID

FROM

....

Where not exists(USER_ID, USER_ID)

;

The load takes several hours. How can I redesign the load to take shorter time?

Thanks,

Jens

16 Replies
Not applicable
Author

The approach with only one exists-parameter is still running and taking about twice the time from before. Will need to analze tomorrow where this is coming from.

marcus_sommer

Then take also a look on the RAM consumption within the taskmanager then loading two fields from a qvd in an optimized load should be even by one billion records quite fast - from a local storage maybe in around 15 minutes (on really performant hardware with SSD's as storage even a lot more faster).

- Marcus

Not applicable
Author

RAM and CPU consumption look ok, at about 30 GB out of 196 on the test machine (production 384), CPU at about 4% usage.

I work on input optimization to reduce load time by preparing the data with the purpose that no "where exists" is needed anymore. What I still find strange is a long break the system seems to take. From the log:

16.02.2016 08:41:11: 0145  RESIDENT

16.02.2016 08:41:11: 0146   TERRITORY

16.02.2016 08:41:11: 0147  WHERE NOT

16.02.2016 08:41:11: 0148   EXISTS(USER_KEY_ALL,USER_KEY_TE&'_ALL')

16.02.2016 08:41:11:       5 fields found: USER_KEY_ALL, ACCESS, NTNAME, PARTY_ID, TerritoryId,                                             1.209.721.637 lines fetched

16.02.2016 12:26:27: 0152  AccessTableParty:

That is the next step. After loading the QVD files with several inner joins, the result is concatenated to an existing (smaller) table.

There are several hours without progress (at least without recognizable progress). Any ideas what this might be?

Thanks again,

Jens

tresesco
MVP
MVP

Potentially it could be a join issue. Try with limited load. Check the data model:

  • Is there any synthetic keys getting generated?
  • Does your key field get repeated after join?
Not applicable
Author

What do you mean with your second question?

tresesco
MVP
MVP

Say in your Key field(which you want the join to perform on with another table) has 100 records and after your inner join, it becomes 100+. This could happen, if both the key fields have non-distinct (repeated) values. This is a possible case of generating many unnecessary and incorrect records while joining - that can slow down your reload process.

marcus_sommer

It's not really clear what you are doing - posting the whole script would be helpful.

Beside them I suggest to consider to build the section access table outside from actually application - it avoids possible side effects and quite probably you want to use these table within other applications, too. Further I could not believe that you have more then a billion of customers which you want to give access to an application - even if there is a customers duplication for the need of a quite detailed authorization I think it's too much as it could be sensible handled and therefore I suggest to rethink your whole approach of authorization.

- Marcus