Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everyone, I have an issue where multiple apps rely on each other and this can be problematic when one of them fails to reload.
A long story short, I have an incomplete file with rows and those rows can also start to contain null values as well. I can generate a complete file for that day when this problem occurred but I must remove the rows from the complete file which are in this incomplete file (because those rows has already been exported to another platform which do not handle duplicates). The problem comes with not having an distinct key/dimension to filter by because an order contains multiple rows and some of those rows might be missing or corrupted.
So, I want to from my big file remove all rows that is in my small file, if the entire row is exactly the same.
Small file:
order id | a | b | c |
100 | 1 | 2 | 3 |
100 | 2 | 3 | 4 |
100 | 3 | null | null |
200 | 1 | 1 | 1 |
200 | 2 | 2 | 2 |
Big File:
(the red rows has a "perfect match" with the small file and therefore should be removed. Note that the row containing null values (in the small file) can just be ignored because that row no longer contains null values in the complete version)
order id | a | b | c |
100 | 1 | 2 | 3 |
100 | 2 | 3 | 4 |
100 | 3 | 4 | 5 |
100 | 4 | 5 | 6 |
100 | 5 | 5 | 5 |
200 | 1 | 1 | 1 |
200 | 2 | 2 | 2 |
200 | 3 | 3 | 3 |
3000 | 10 | 10 | 11 |
... | ... | ... | ... |
Resulting table:
order id | a | b | c |
100 | 3 | 4 | 5 |
100 | 4 | 5 | 6 |
100 | 5 | 5 | 5 |
200 | 3 | 3 | 3 |
3000 | 10 | 10 | 11 |
... | ... | ... | ... |
My guess would be an excluding join but I haven't manage it to work with "all fields must be the same". Thanks for any help or suggestions.
/David
Hi @DavidJG , you should use something like this :
Load
orderid & ' | ' & a & ' | ' & b & ' | ' & c as key,
orderid,
a,
b,
c
FROM Small file;
Load
orderid & ' | ' & a & ' | ' & b & ' | ' & c as key,
orderid,
a,
b,
c
FROM Big file
where
NOT EXISTS(key, orderid & ' | ' & a & ' | ' & b & ' | ' & c);
Is the purpose to have one complete file with Small and Big together? You don't want copies of rows that not he ones with null?
This could be:
load distinct * from Small.qvd (qvd) where not isnull(b) and not isnull(c);
concatenate load * from Big.qvd (qvd);
The "distinct" will apply to the resulting concatenated table.
Hi,
The big file contains the "grand total" but because some rows has already been exported (all the ones in the small file) those rows needs to be removed from the Big file. It doesn't really matter if values are null or not but if a row (in its entirety) in the Small file also exists in the Big file, that row in the Big file needs to be excluded.
"I have all the rows in this big file, now I need to remove all rows that already exists in this small file"
If I were to concatenate the rows from the small file and the desired/resulting table I would get all the rows from the big file.
Hope this explanation is more clear 🙂
Hi David,
what type is your file format from where you need to remove the intersection of your data? Is it QVD or any other file format?
Thank you.
m
Hi @DavidJG , you should use something like this :
Load
orderid & ' | ' & a & ' | ' & b & ' | ' & c as key,
orderid,
a,
b,
c
FROM Small file;
Load
orderid & ' | ' & a & ' | ' & b & ' | ' & c as key,
orderid,
a,
b,
c
FROM Big file
where
NOT EXISTS(key, orderid & ' | ' & a & ' | ' & b & ' | ' & c);