Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a table that looks like this
Where the periods are simply filler values. The only columns I want to really work with are id and value.
LOAD * INLINE [ id, value, x, y ,z 1, A, a, a . 1, B, b, . . 1, B, c, . .
1, A, d, . . 2, D, . . . 2, D, . . . 3, D, . . . ];
I want to remove duplicates within my table but only if they're sequential within an `id`. I would prefer to keep the first of each duplicate so the result would look something like this:
id | value | x | y |
1 | A | a | a |
1 | B | b | . |
1 | A | d | . |
2 | D | . | . |
3 | D | . | . |
Again, ignore the filler columns (x-z). I only include them because often times solutions don't take into account other columns if they're not explicitly declared present
Edit: made question more specific
sound like you just want the first occurrence of the 1st 2 fields:
data:
LOAD rowno() as seq,* INLINE [
id, value, x, y ,z
1, A, a, a, .
1, B, b, ., .
1, B, c, ., .
2, D, ., ., .
2, D, ., ., .
3, D, ., ., .
];
first:
load id, value, min(seq) as seq Resident data group by id, value;
inner join (data) load * Resident first;
drop table first;
Hi Edwin, in some instances there are duplicates that are not sequential (i forgot to show that in my table) but I would like to keep any duplicates that arent sequential, so I dont think the returning only the first instance would suffice
I also edited the inline script and result table to better show this
If I understand your question, you can read (resident) the data and compare id with previous id and value with previous value
using the previous or peek function.
I added some rows for test at the end of the data table.
data:
LOAD rowno() as seq,
* INLINE [
id, value, x, y ,z
1, A, a, a, .
1, B, b, ., .
1, B, c, ., .
2, D, ., ., .
2, D, ., ., .
3, D, ., ., .
1, A, a1, a1, .
1, A, a2, a2, .
1, B, c, ., .
];
data2:
NOCONCATENATE LOAD
*
RESIDENT
data
WHERE
id <> Previous(id) OR value <> Previous(value)
;
DROP Table data;