Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Complex sorting scenario - haven't yet had the time to find an easy solution ...

Hi,

I have a complex sorting scenario that i couldn't yet come up with an easy solution for - I think I know a way to do it, but a rather complex one, maybe someone here knows an easier way.

I can try to put together some sample data, but it's rather easy on the data_side:

- I have a number of different item_numbers;

- For every item_number, there is a number of records, each with a unique date-time-stamp;

- What I want as output is a table that has, for every item_nr, only the latest record (with the most up-to-date date-time-stamp)

=> So, to begin with, I would sort in a RESIDENT LOAD by

     - the item_no (asc.)

     - the date-time-stamp (desc)

=> Then I was thinking of using a >> FIRST 1 LOAD << - but that would only fetch the one first record, so I'd need that once per

      item_no

=> To accomplish that, I was thinking of using a >> LOAD DISTINCT [item_no] << to create an auxiliary table and

      to parse this table line by line and in every iteration, make a variable and then >> FIRST 1 LOAD << from the other

      table with a WHERE clause filtering for that one item_no

==> That way I'd get the most up-to-date record for every item_no

<=> The drawback is, that is a rather complex solution. So that's my question - is there an easier way?

Thanks a lot!

Best regards,

DataNibbler

14 Replies
datanibbler
Champion
Champion
Author

Hi all,

that code now works fine.

The issue is, it takes much too long. The part of the code where it does  take ages is the loop - no wonder: In every iteration of the loop, two variables (the plan_number and an ID) are saught for in the original table - meaning the code has to search approx. half a million records in every one of approx. 4.000 iterations ...

It is also thinkable that the renaming of every single field takes so long - I introduced that to make sure that the new table - being built bit by bit in the loop - is not instantaneously linked to the old one.

I will attach the code to this post so you can have a look.

I have an idea that it might be possible to just INNER JOIN the aux_table (Temp2) to the original table - or to use KEEP - to throw out all records from the original table which do not match (that commented section).

I just haven't yet had the time to try it out - because of the LOAD_scripts running, I can only test that during approx. 15min every hour ...

Would that work?

Thanks a lot!

Kushal_Chawda

Yes, inner join will also work by which you can skip one load to get the result

Data:

LOAD IDOC,

            Date

FROM Table;

Inner join(Data)

LOAD IDOC,

            Date(max(Date)) as Date,

Resident Data

group by IDOC;


Update : Use max function instead

Gysbert_Wassenaar

Why not use a right join (or keep) with a table that calculates the max timestamp per whatever combination of fields you need:

TableX:

LOAD

     plan_number,

     ID,

     datetime,

     ...other fields

FROM

     ...source...

     ;

RIGHT JOIN (TableX)

LOAD

     plan_number,

     ID,

     max(datetime) as datetime

RESIDENT     // *

     TableX

GROUP BY

     plan_number,

     ID

     ;

//* If you're using a qvd as source then reading that may be faster than a resident load. Or perhaps you can push the calculation of the max(datetime) to the database server in a select statement.


talk is cheap, supply exceeds demand
datanibbler
Champion
Champion
Author

Hi Gysbert,

you're right, reading the qvd might be faster - but then I cannot use the >> ORDER BY <<, can I? I will try with GROUP_BY.

@ kushal

I will try with an INNER JOIN and that Firstsortedvalue() function. I have used it before, let's see ...

Thanks a lot!

Gysbert_Wassenaar

I don't see why you need an order by.

What I want as output is a table that has, for every item_nr, only the latest record (with the most up-to-date date-time-stamp)

You can do that with the approach I posted above.


talk is cheap, supply exceeds demand