Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
slacayo
Contributor III
Contributor III

Remove Sequential Duplicates within a group-by?

I  have a table that looks like this

Where the periods are simply filler values. The only columns I want to really work with are id and value. 

LOAD * INLINE [
     id, value, x, y ,z
     1,    A,   a, a  .
     1,    B,   b, .  .
     1,    B,   c, .  .
1, A, d, . .      2, D, . . .      2, D, . . .      3, D, . . . ];

I want to  remove duplicates within my table but only if they're sequential within an `id`. I would prefer to keep the first of each duplicate so the result would look something like this:

id value x y
1 A a a
1 B b .
1 A d .
2 D . .
3 D . .

 

Again, ignore the filler columns (x-z). I only include them because often times solutions don't take into account other columns if they're not explicitly declared present

 

Edit: made question more specific

Labels (4)
4 Replies
edwin
Master II
Master II

sound like you just want the first occurrence of the 1st 2 fields:

data:
LOAD rowno() as seq,* INLINE [
     id, value, x, y ,z
     1,    A,   a, a,  .
     1,    B,   b, .,  .
     1,    B,   c, ., .
     2,    D,   .,  .,  .
     2,    D,   .,  .,  .
     3,    D,   .,  .,  .
];

first:
load id, value, min(seq) as seq Resident data group by id, value;

inner join (data) load * Resident first;

drop table first;
slacayo
Contributor III
Contributor III
Author

Hi Edwin, in some instances there are duplicates that are not sequential (i forgot to show that in my table) but I would like to keep any duplicates that arent sequential, so I  dont think the returning only the first instance would suffice 

slacayo
Contributor III
Contributor III
Author

I also edited the inline script and result table to better show this

maxgro
MVP
MVP

If I understand your question, you can read (resident) the data and compare id with previous id and value with previous value

using the previous or peek function.

I added some rows for test at the end of the data table.

 

 

data:
LOAD rowno() as seq,
* INLINE [
id, value, x, y ,z
1, A, a, a, .
1, B, b, ., .
1, B, c, ., .
2, D, ., ., .
2, D, ., ., .
3, D, ., ., .
1, A, a1, a1, .
1, A, a2, a2, .
1, B, c, ., .
];


data2:
NOCONCATENATE LOAD
*
RESIDENT
data
WHERE
id <> Previous(id) OR value <> Previous(value)
;

DROP Table data;