Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have a complex sorting scenario that i couldn't yet come up with an easy solution for - I think I know a way to do it, but a rather complex one, maybe someone here knows an easier way.
I can try to put together some sample data, but it's rather easy on the data_side:
- I have a number of different item_numbers;
- For every item_number, there is a number of records, each with a unique date-time-stamp;
- What I want as output is a table that has, for every item_nr, only the latest record (with the most up-to-date date-time-stamp)
=> So, to begin with, I would sort in a RESIDENT LOAD by
- the item_no (asc.)
- the date-time-stamp (desc)
=> Then I was thinking of using a >> FIRST 1 LOAD << - but that would only fetch the one first record, so I'd need that once per
item_no
=> To accomplish that, I was thinking of using a >> LOAD DISTINCT [item_no] << to create an auxiliary table and
to parse this table line by line and in every iteration, make a variable and then >> FIRST 1 LOAD << from the other
table with a WHERE clause filtering for that one item_no
==> That way I'd get the most up-to-date record for every item_no
<=> The drawback is, that is a rather complex solution. So that's my question - is there an easier way?
Thanks a lot!
Best regards,
DataNibbler
Hi DataNibbler,
I could think on two possiblities. The first is to use Peek() or Previous() ? within a sorted load and another is to use an aggregation with firstsortdevale().
- Marcus
Hi Marcus,
>> FirstsortedValue << looks good.
With > Previous << I guess I could load RESIDENT the first record of every group but the very first - and I could load that by using a >> FIRST 1 LOAD <<, so that should work, too.
I'll try that and report back here.
Thanks!
Hi DataNibbler,
within the script I haven't ever used the firstsortedvalue-aggregation - I use it only within charts - but you should simply try it. But I think that with larger datasets the peek/previous method will be faster even if you really need 3 loadings for it:
1. data-load to get the data into qv (order by is only by resident loads possible)
2. peek/previous logic to flag (or cumulate) your data
3. filter your data with a where-clause appropriate to your flag
and forget First 1 then it will only load one record in total not for each item.
- Marcus
Hi Marcus,
it seems to work.
The only thing I'm not really sure about, even if it looks like it worked, is the very first record loaded - that one has no previous record, so I'm not sure the PREVIOUS() function works correctly on that one?
I have done as you said and loaded one field (after sorting by 1) plan_no (asc) and 2) timestamp (desc.)
>> IF(PREVIOUS(ID) <> ID, 1, 0) as First_record_in_group <<
As I said, it looks good - we have checked in the database and it seems the logic has captured the first plan_number with the most up-to-date timestamp.
Hi DataNibbler,
AFAIK previous/peek will return NULL within the first record and are therefore very probably different to your check within an if-loop so that you get your specified false-value. Otherwise you could catch the case of the first record with: if(rowno() = 1, this, that).
- Marcus
Hi Marcus,
that trick with PREVIOUS() does exactly what it should - unfortunately it turned out I understood the requirement in a way different from my colleague, so what the code does is not what we want as of now. I think I know now what I misunderstood and I will try to explain again:
- We have a number of delivery_plans
- For every delivery_plan, we get a so-called IDOC every so often
- Every IDOC contains a number of records, all of which we need
- Every IDOC has a unique date-time-stamp (all of the records belonging to that IDOC have the same) of when it was transferred to our IT system.
=> For every delivery_plan, I want to extract only the latest IDOC (all of the records which have the most up-to-date
date-time-stamp)
=> That is the part where the code does what I understood on Friday that it should do, but not what we want: Due to the query using PREVIOUS(), the code grabs only one single record, the very first in a group.
Now, again, I can think of a solution, but I'm not sure if it might be possible in an easier way:
=> We'd have to extract that very first record like we do now and store it in an auxiliary_table, parse it, make two
variables out of a) the delivery_plan_number and b) the date-time-stamp
and extract from the original table all the records with an identical date-time-stamp.
What do you think of this?
Thanks a lot!
Best regards,
DataNibbler
Hi DataNibbler,
it might be possible to extend the previous-logic to flag not only the last record of a delivery plan else all records to their IDOC. But easier seems to me to split these table like you mentioned it maybe in a main-table with all delivery plans and their timestamp and a flagging on the last and a second table with the IDOC's and a key to the main-table.
- Marcus
Can you try this
Data:
LOAD IDOC,
Date
FROM Table;
left join(Data)
LOAD IDOC,
FirstSortedValue(Date,-Date) as Date,
1 as LatestDateFlag
Resident Data
group by IDOC;
Final:
noconcatenate
LOAD *
resident Data
where LatestDateFlag=1;
drip table Data;
How about something like this:
LOAD TransID,
TransDate,
...
Resident ...
Where Not(Exists(transID))
Order By TransID, transDate DESC;