Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have an issue with the load of really large table. It deals with user to customer relationships and is the preparation for the section access.
First of all, I load a smaller table (about 1000 rows) with users who shall be able to see everything.
Then I load a table that has about 1 billion rows with a "where not exists (user_id)" clause,
Table A:
LOAD
USER_ID,
VIEW_ALL_FLAG
FROM
...
;
Table B:
LOAD
USER_ID,
CUSTOMER_ID
FROM
....
Where not exists(USER_ID, USER_ID)
;
The load takes several hours. How can I redesign the load to take shorter time?
Thanks,
Jens
The approach with only one exists-parameter is still running and taking about twice the time from before. Will need to analze tomorrow where this is coming from.
Then take also a look on the RAM consumption within the taskmanager then loading two fields from a qvd in an optimized load should be even by one billion records quite fast - from a local storage maybe in around 15 minutes (on really performant hardware with SSD's as storage even a lot more faster).
- Marcus
RAM and CPU consumption look ok, at about 30 GB out of 196 on the test machine (production 384), CPU at about 4% usage.
I work on input optimization to reduce load time by preparing the data with the purpose that no "where exists" is needed anymore. What I still find strange is a long break the system seems to take. From the log:
16.02.2016 08:41:11: 0145 RESIDENT
16.02.2016 08:41:11: 0146 TERRITORY
16.02.2016 08:41:11: 0147 WHERE NOT
16.02.2016 08:41:11: 0148 EXISTS(USER_KEY_ALL,USER_KEY_TE&'_ALL')
16.02.2016 08:41:11: 5 fields found: USER_KEY_ALL, ACCESS, NTNAME, PARTY_ID, TerritoryId, 1.209.721.637 lines fetched
16.02.2016 12:26:27: 0152 AccessTableParty:
That is the next step. After loading the QVD files with several inner joins, the result is concatenated to an existing (smaller) table.
There are several hours without progress (at least without recognizable progress). Any ideas what this might be?
Thanks again,
Jens
Potentially it could be a join issue. Try with limited load. Check the data model:
What do you mean with your second question?
Say in your Key field(which you want the join to perform on with another table) has 100 records and after your inner join, it becomes 100+. This could happen, if both the key fields have non-distinct (repeated) values. This is a possible case of generating many unnecessary and incorrect records while joining - that can slow down your reload process.
It's not really clear what you are doing - posting the whole script would be helpful.
Beside them I suggest to consider to build the section access table outside from actually application - it avoids possible side effects and quite probably you want to use these table within other applications, too. Further I could not believe that you have more then a billion of customers which you want to give access to an application - even if there is a customers duplication for the need of a quite detailed authorization I think it's too much as it could be sensible handled and therefore I suggest to rethink your whole approach of authorization.
- Marcus