Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi, I have a app that traverses many (>100) excel files, and it's starting to get a app-size problem for me
The obvious solution is that I would only really need the latest version of an item (row) (unique ID column exists, date-time column exists)
any ideas on how i can accomplish that during the import?
basically i want it to add new rows - and replace existing row if imported row is newer only. AND important note across excel files being imported
First load all the data, then right join that table with a new table with only the maximum date-time values per ID
Data:
LOAD
ID,
MyDateTimeField,
...other fields...
FROM
*.xlsx (ooxml, embedded labels, table is Sheet1)
;
RIGHT JOIN (Data)
LOAD
ID, Max(MyDateTimeField) as MyDateTimeField
RESIDENT
Data
GROUP BY
ID
;
Interesting !
so will that strategy work with the 50mb app limit I have today (cloud plus account),
I mean if i have to load all the data - Iam still temporary exceeding the limit before I can reduce size.
hmm and doing inside the load statement will not work across the >100 sourcefiles...?
Thx - giving it a try !
If you're worried about the 50MB app limit you could first load only the ID and datetime from all the excel files, calculate the max datetime per ID and then load the rest of the data:
Temp1:
LOAD
ID,
MyDateTimeField
FROM
*.xlsx (ooxml, embedded labels, table is Sheet1)
;
Temp2:
LOAD
ID & '|' & Max(MyDateTimeField) as KEY
RESIDENT
Temp1
GROUP BY
ID
;
DROP TABLE Temp1;
Result:
LOAD
*
FROM
*.xlsx (ooxml, embedded labels, table is Sheet1)
WHERE
Exist(KEY, ID & '|' & MyDateTimeField
;
DROP TABLE Temp2;