2 Replies Latest reply: May 31, 2012 6:25 PM by Rob Wunderlich RSS

problem with using previous and order by resident table

Jack Zheng

I just discovered something interesting.  Please see attached file for reference.

 

When I tried to do a order by in the same table as I am doing the previous, it breaks (new_table1).  I had to create a new sort table, sort the data first, and then de-dupe the data (new_table4).

 

new_table1 shows that it doesn't work while new_table4 does work after I do a sort_table first.

 

Why is that?

 

Also, since I have to do 2 table loads using the previous function, which will be faster?  Using distinct or use the 2 tables and the previous function?

  • problem with using previous and order by resident table
    Oleg Troyansky

    Looks like a lot of trouble just to ensure that the rows are DISTINCT.

     

    1. You can add the keyword DISTINCT after the LOAD to get to the same result. Keep in mind, however, that the DISTINCT applies to all the fields in the row - i.e. if some of the non-key fields are different, the rows will be technically considered DISTINCT even though they key is the same.

     

    2. To ensure that the keys are unique, I'd rather load the same info using GROUP BY and min() or firstsortedvalue() for all non-key fields:

     

    load

    Key1, Key2, Key3,

    min(Attr1) as Attr1,

    MinString(Attr2) as Attr2,

    ...

    resident

    Table1

    group by

    Key1, Key2, Key3

    ;