Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I just discovered something interesting. Please see attached file for reference.
When I tried to do a order by in the same table as I am doing the previous, it breaks (new_table1). I had to create a new sort table, sort the data first, and then de-dupe the data (new_table4).
new_table1 shows that it doesn't work while new_table4 does work after I do a sort_table first.
Why is that?
Also, since I have to do 2 table loads using the previous function, which will be faster? Using distinct or use the 2 tables and the previous function?
Looks like a lot of trouble just to ensure that the rows are DISTINCT.
1. You can add the keyword DISTINCT after the LOAD to get to the same result. Keep in mind, however, that the DISTINCT applies to all the fields in the row - i.e. if some of the non-key fields are different, the rows will be technically considered DISTINCT even though they key is the same.
2. To ensure that the keys are unique, I'd rather load the same info using GROUP BY and min() or firstsortedvalue() for all non-key fields:
load
Key1, Key2, Key3,
min(Attr1) as Attr1,
MinString(Attr2) as Attr2,
...
resident
Table1
group by
Key1, Key2, Key3
;
A simple way to retain only the first row of a key field(s) is like this:
Data:
LOAD *
WHERE AutoNumber(recno(), guid) = 1
;
Load * Inline [
guid, create_date, close_date
1, 1/1/2012, 1/2/2012
2, 1/2/2012, 2/3/2012
1, 1/1/2012, 1/2/2012
1, 1/1/2012, 4/2/2012
]
;
-Rob