Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to identify the duplicate data from a list

Hi Experts,

I have a set of data like below:

StudentID

ItemID

1

1001

1

1002

1

1003

1

1004

2

1004

2

1003

2

1002

2

1001

3

1001

3

1002

3

1003

3

1005

4

1001

4

1002

4

1005

4

1003

5

1001

5

1005

5

1003

5

1004

6

1007

6

1005

6

1001

6

1004

As you can see,

StudentID 1's item list: 1001, 1002, 1003, 1004

StudentID 2's item list: 1004, 1003, 1002, 1001

They actually have the same list of items.

I need to change the above data like below:

ResultID

Item1

Item2

Item3

Item4

# students

1

1001

1002

1003

1004

2

2

1001

1002

1003

1005

2

3

1001

1005

1003

1004

1

4

1007

1005

1001

1004

1

We can filter and count the duplicate lists and have a list of items for each unique combination in code like Java, but I am wondering if we can do it in qlik sense data load script.

Any help will be appreciated.

Thanks,

Jing

1 Solution

Accepted Solutions
Gysbert_Wassenaar

Create a concatenated list of the values per student:

Results:

LOAD

     StudentID,

     ItemList

FROM

     MyTable

     ;

StudentItemList

LOAD

     StudentID,

     concat(ItemID, ', ', ItemID) as ItemList

     Autonumber(concat(ItemID, ', ', ItemID)) as Result

RESIDENT

     MyTable

GROUP BY

     StudentID

     ;    

Then you can use Result as dimension and count(distinct StudentID) as expression in a chart.


talk is cheap, supply exceeds demand

View solution in original post

5 Replies
Gysbert_Wassenaar

Create a concatenated list of the values per student:

Results:

LOAD

     StudentID,

     ItemList

FROM

     MyTable

     ;

StudentItemList

LOAD

     StudentID,

     concat(ItemID, ', ', ItemID) as ItemList

     Autonumber(concat(ItemID, ', ', ItemID)) as Result

RESIDENT

     MyTable

GROUP BY

     StudentID

     ;    

Then you can use Result as dimension and count(distinct StudentID) as expression in a chart.


talk is cheap, supply exceeds demand
reddy-s
Master II
Master II

Hi Jing,

Make use of the table and list boxes to figure out issues in the data.

Thanks,

Sangram.

Not applicable
Author

Thanks so much Gysbert! It is working perfectly. I didn't realize the concat actually first sorts the item list.

Not applicable
Author

Sangram,

You mean at presentation layer? The real data is more complicated, so I have to organize well in data model. Thanks anyway. Gysbert's way is working perfectly in my case.

Thanks,

Jing

reddy-s
Master II
Master II

Yes Jing, on the presentation layer.