Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Problem aggregating a table and joining


Hi,

I have the following scenario:

- From our database I get a list of bookings, each with an exact date-time-stamp and the signon of the user. I will attach a small sample file to illustrate what I have

- I want the last (actually the second-to-last) booking of each individual box - the others are not relevant.

- To this end, I aggregate and I use the max() function to get the last entry, like this:

Noconcatenate

Transaction_aux:
LOAD
   
TRACKING_NUMBER as Track_AOS,
   
max(TRAN_DATE) as Letzte_Buchung
RESIDENT Transaction_Details_pre
GROUP BY TRACKING_NUMBER;

(I  cannot include the user here, otherwise I'd only get the last posting grouped by user, but I want the absolute last booking of that box)

The next step is to join the user from the original table via the TRACKING_NUMBER and the TRAN_DATE (which, together, form a unique key to every booking)

JOIN (Transaction_aux)
LOAD
   
TRACKING_NUMBER as Track_AOS,
   
TRAN_DATE as Letzte_Buchung,
   
USER_SIGNO as Letzter_Bucher
RESIDENT Transaction_Details_pre;
DROP TABLE Transaction_Details_pre;

<=> For some reason, I still have multiple bookings - and thus multiple users - in the resulting table though the date-time-stamp is not the same.

      Needless to say, I cannot use that - I need exactly one booking per TRACKING_NUMBER so I can then join with the names of the employees who were on duty
      at that time.

Can anyone help me here?

Thanks a lot!

Best regards,

DataNibbler

1 Solution

Accepted Solutions
Gysbert_Wassenaar

Ok, both the timestamp and the user.

Last entry per tracking number:

LOAD

     TRACKING_NUMBER,

     max(TRAN_DATE) as LAST_TRAN_DATE

     FirstSortedValue(USER,-TRAN_DATE) as USER_OF_LAST_TRAN_DATE

FROM .....

GROUP BY TRACKING_NUMBER;

Or if you actually want the second last entry per tracking number:

LOAD

     TRACKING_NUMBER,

     max(TRAN_DATE,2) as LAST_TRAN_DATE

     FirstSortedValue(USER,-TRAN_DATE,2) as USER_OF_LAST_TRAN_DATE

FROM .....

GROUP BY TRACKING_NUMBER;

If you don't care about the tracking number, but want the absolute (second to) last entry then remove tracking number from the load statement and remove the group by too.


talk is cheap, supply exceeds demand

View solution in original post

13 Replies
rubenmarin

Hi DataNibbler,

I upload a sample you can use to join with Transaction_aux only by TRACKING_NUMBER

Hope this helps!

Gysbert_Wassenaar

I'm not quite sure what you're after. Maybe this helps:

Data:

LOAD TRACKING_NUMBER,

     TRAN_DATE,

     USER

FROM [comm143662.xlsx] (ooxml, embedded labels, table is Sheet1);

join(Data)

LOAD TRACKING_NUMBER, max(TRAN_DATE,2) as TRAN_DATE, 1 as SecondToLastFlag

Resident Data

GROUP BY TRACKING_NUMBER;

If you want the user name instead you could use this:

Users:

LOAD TRACKING_NUMBER, FirstSortedValue(USER,-TRAN_DATE,2) as USER_OF_INTEREST

FROM [comm143662.xlsx] (ooxml, embedded labels, table is Sheet1)

GROUP BY TRACKING_NUMBER;


talk is cheap, supply exceeds demand
datanibbler
Champion
Champion
Author

Hi Ruben,

that seems worth a try. I hadn't thought of using this function. The effect should be basically the same as by aggregating with max().

Let's see.

Thanks a lot!

datanibbler
Champion
Champion
Author

Hi Gysbert,

well, what I actually want is two things:

- The (date-time-stamp of the) very last booking out of that list of bookings I get from our database

- The user who effected it

Any way, I have to aggregate the table - whether I go for the last user first and then join the date-time-stamp or the other way round doesn't really matter.

In the output table I want only one date-time-stamp per box_ID and one user to match it.

That's it basically.

I'll try to find out first whether my problem is in the aggregation or in the JOIN and then I'll try Rubben's suggestion.

datanibbler
Champion
Champion
Author

Hi all,

well, I have just tested and positively confirmed that the aggregation works as expected - right after that step, I have only one date-time-stamp (in numeric format) per box.

=> The problem must be in the join. I'm not sure what can be wrong about that.

Unfortunately, it takes a few minutes each time I run that script. I'll try again. Maybe I have to explicitly turn both date-time-stamps into numeric format?

Gysbert_Wassenaar

Ok, both the timestamp and the user.

Last entry per tracking number:

LOAD

     TRACKING_NUMBER,

     max(TRAN_DATE) as LAST_TRAN_DATE

     FirstSortedValue(USER,-TRAN_DATE) as USER_OF_LAST_TRAN_DATE

FROM .....

GROUP BY TRACKING_NUMBER;

Or if you actually want the second last entry per tracking number:

LOAD

     TRACKING_NUMBER,

     max(TRAN_DATE,2) as LAST_TRAN_DATE

     FirstSortedValue(USER,-TRAN_DATE,2) as USER_OF_LAST_TRAN_DATE

FROM .....

GROUP BY TRACKING_NUMBER;

If you don't care about the tracking number, but want the absolute (second to) last entry then remove tracking number from the load statement and remove the group by too.


talk is cheap, supply exceeds demand
datanibbler
Champion
Champion
Author

Great!

I would still like to know why the JOIN doesn't work, but your suggestion with FIRSTSORTEDVALUE() looks very good.

I need all three informations: The date-time-stamp, the TRACKING_NUMBER (box_ID) and the user.

Thanks a lot!

Best regards,

DataNibbler

rubenmarin

I made another sample joining and simplier, but it's the same Gysbert has posted, anyway i upload it as qvw.

datanibbler
Champion
Champion
Author

Now I have another strange problem.

With this, I get only one booking with the date-time-stamp, that is ok.

In some cases, I also get exactly one name, the user who did the booking.

<=> In other cases, no name is available. That is very strange for I don't think it is possible to log on to our database at all to do any booking there without the logon being registered.

I have to check that.