Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi folks,
I have a qvd file with consignments and their products e.g.
FROM consignments.qvd (qvd):
consignment_id | product_id | product_state |
A` | 1 | pending |
A | 2 | received |
B | 1 | pending |
B | 3 | pending |
B | 4 | pending |
C | 2 | received |
Then I query my POS system for new/modified records. The way the POS API works means I have to pull a full replacement list of records for any consignment that has been added/modified.
e.g. RESIDENT consignments:
consignment _id | product_id | product_state |
B | 3 | received |
C | 4 | received |
D | 5 | pending |
The final updated table I want would look like this:
consignment _id | product_id | product_state |
A | 1 | pending |
A | 2 | received |
B | 3 | received |
C | 4 | received |
D | 5 | pending |
Note that:
To get the result above I thought I could do
Concatenate Load * FROM /consignments.qvd(qvd) WHERE NOT EXISTS(consignment_id);
But doing that means I only ever pull one product per consignment from disk (because as soon as the first product is pulled from disk, the consignment_id exists and no more products are pulled for that consignment.
Then I thought I could try making a compound e.g.
consignment_id & product_id AS key
so I could then do
Concatenate Load * FROM /consignments.qvd(qvd) WHERE NOT EXISTS(key);
...but that merges rather than replaces based on consignment_id. In the above example, consignment B would end up with three products rather than only one.
I'm sure there is a neat way to do this. Any thoughts?
If you could identify the new and changed records on unique values respectively combinations of fields the where not exists(Key) approach should be working. If not you need a different and more complex logic.
This could be to query an existing change-timestamp or to create one by adding now() to each record and then newer records could be identified.
Another method could be to regard the states as a numeric value and only the max. value remained maybe by an inner join load on an appropriate max-grouping. Possible are probably also mapping-solutions or ways to create an extra exists-table. In each case it will need more efforts ...
Beside this you may check if you could divide your data into YYYYMM slices and approaches like everything which is older as n month is loaded from the historic qvd's and all newer stuff is completely loaded from the data-base.
If the last is possible and fitting within the time-frames to update the data I would use such simple logic instead of building the more complex logic which was mentioned above.
- Marcus
Thanks Marcus,
I did a bad job of describing my scenario.
It turns out that my question is mostly answered by https://community.qlik.com/t5/Knowledge/WHERE-NOT-EXISTS-clause-returns-only-one-row-for-each-non/ta... which says
The solution in this case is to rename the original ID field (for example ID1) and use WHERE NOT Exists(ID1, ID); instead. With this implementation, ID and ID1 are different fields, so they have different symbol tables and is not affected by the on-the-run update of the symbol table as shown above.
The trick I'm now trying to work out is how to get that onto disk with a single "consignment_id" column. Right now I'm getting:
Perhaps I need to do yet another load to merge those columns and name them.
If I understand you right you could get a working solution by applying an exists-condition with two parameters. In general that's a good solution - but you will lost the optimized mode by loading these data from a qvd or resident (if there were further transformations within the load this consideration could be neglected). If this load didn't hit any essential time-frame you could remain by it.
If instead the load-times are too long it could become difficult to optimize it further. One method may be to transfer the essential parts in beforehand by creating/including the (combined) information within the previous loads. Another approach could be to implement any incremental methods or more granular layers to applying the transformation against smaller datasets or to be able to transfer the transformation to parallel time-frames.
- Marcus
Hi @crossroadsit,
Instead of the WHERE NOT EXIST (consignment_id) you should make a variable containing all id's. Then use that variable in a WHERE NOT MATCH clause for loading in the QVD.
Try this:
FinalTable:
LOAD *
, Date(Today(1)) AS POS_Date
;
//SOURCE = POS System
LOAD * INLINE [
consignment_id, product_id, product_state
B, 3, received
C, 4, received
D, 5, pending
];
NOCONCATENATE
ConsigmentIdList:
LOAD
CHR(39) & Concat(consignment_id, CHR(39) & ',' & CHR(39)) & CHR(39) AS ConsignmentIdList
RESIDENT FinalTable
;
LET vConsignmentIdList = Fieldvalue('ConsignmentIdList', 1);
DROP TABLE ConsigmentIdList;
CONCATENATE(FinalTable)
LOAD *
, Date(Today(1)-1) AS POS_Date
WHERE NOT MATCH(consignment_id, $(vConsignmentIdList))
;
//SOURCE = Consignments QVD
LOAD * INLINE [
consignment_id, product_id, product_state
A, 1, pending
A, 2, received
B, 1, pending
B, 3, pending
B, 4, pending
C, 2, received
];
These additional insights are much appreciated @marcus_sommer
Yes, I have it working with a NOT Exists condition now and I think the loss of optimized load is acceptable in this instance.
Thank you for sharing this @Ruhulessin
I actually tried something along these lines a while ago but must have been making an error somewhere as I ended up with predictable results. I have a solution working now using WHERE NOT EXISTS so I will probably stick with it for now.
I'm interested, however, if the approach you suggest with a variable would cope with an arbitrary number of (rather long) GUIDs?
I do not see why that should be a problem. How long numbers are we talking about? Depends more on the max charachters a variable can handle, but according to this topic (Variable Max Limit ) a variable can handle quite some numbers.
-Ruben