Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to only import the latest occurrence of an row

Hi, I have a app that traverses many (>100)  excel files, and it's starting to get a app-size problem for me

The obvious solution is that I would only really need the latest version of an item (row) (unique ID column exists, date-time column exists)

any ideas on how i can accomplish that during the import?

basically i want it to add new rows - and replace existing row if imported row is newer only. AND important note across excel files being imported

5 Replies
Gysbert_Wassenaar

First load all the data, then right join that table with a new table with only the maximum date-time values per ID

Data:

LOAD

     ID,

     MyDateTimeField,

     ...other fields...

FROM

     *.xlsx (ooxml, embedded labels, table is Sheet1)

     ;

RIGHT JOIN (Data)

LOAD

     ID, Max(MyDateTimeField) as MyDateTimeField

RESIDENT

     Data

GROUP BY

     ID

     ;


talk is cheap, supply exceeds demand
Not applicable
Author

Interesting !

so will that strategy work with the 50mb app limit I have today (cloud plus account),

I mean if i have to load all the data - Iam still temporary exceeding the limit before I can reduce size.

Not applicable
Author

hmm and doing inside the load statement will not work across the >100 sourcefiles...?

Not applicable
Author

Thx  - giving it a try !

Gysbert_Wassenaar

If you're worried about the 50MB app limit you could first load only the ID and datetime from all the excel files, calculate the max datetime per ID and then load the rest of the data:

Temp1:

LOAD

     ID,

     MyDateTimeField

FROM

     *.xlsx (ooxml, embedded labels, table is Sheet1)

     ;

Temp2:

LOAD

     ID & '|' & Max(MyDateTimeField) as KEY

RESIDENT

     Temp1

GROUP BY

     ID

     ;

DROP TABLE Temp1;

Result:

LOAD

     *

FROM     

     *.xlsx (ooxml, embedded labels, table is Sheet1)

WHERE

     Exist(KEY, ID & '|' & MyDateTimeField

     ;

DROP TABLE Temp2;


talk is cheap, supply exceeds demand