Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
shekhar_analyti
Specialist
Specialist

How to create "logical duplicates" free QVD from given Data , even when it appears no duplicate ?

Hi All ,

How to create "logical duplicates" free QVD from given Data , even when it appears no duplicate ?

Example : In Input data , Row 1 and 2 are logically same data , except for fact that names of same person coming in different field . Also data elements in C1,C2,C3,C4 are same but in different columns .

Requirement is to load only one of such data , which means we can load ROW 1  skipping  ROW 2 

INPUT 

dups1.png

OUTPUT QVD : free from data  which are strike out 

dups2.png

 

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

 

Thanks & Regards

Shekar

6 Replies
marcus_malinow
Partner - Specialist III
Partner - Specialist III

Interesting little challenge. Here's my solution, though interested to see if there are more elegant ways of achieving the same result...

 

LOAD
SAMPLE, Habitat,
subfield(P, ',', 1) as P1,
subfield(P, ',', 2) as P2,
subfield(P, ',', 3) as P3,
subfield(P, ',', 4) as P4,
subfield(P, ',', 5) as P5,
subfield(C, ',', 1) as C1,
subfield(C, ',', 2) as C2,
subfield(C, ',', 3) as C3,
subfield(C, ',', 4) as C4,
subfield(C, ',', 5) as C5
;
LOAD
SAMPLE, Habitat, concat(DISTINCT P, ',') as P , concat(DISTINCT C, ',') as C
GROUP BY SAMPLE, Habitat
;
LOAD
SAMPLE, Habitat, subfield(P, ',') as P, subfield(C, ',') as C
;
LOAD
SAMPLE, Habitat,
P1 & if(len(P2) > 0, ',' & P2 & if(len(P3) > 0, ',' & P3 & if(len(P4) > 0, ',' & P4 & if(len(P5) > 0, ',' & P5)))) as P,
C1 & if(len(C2) > 0, ',' & C2 & if(len(C3) > 0, ',' & C3 & if(len(C4) > 0, ',' & C4 & if(len(C5) > 0, ',' & C5)))) as C
;
LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
]
;

marcus_sommer

I think I would go with the trial to load the data with a crosstable-statement, then aggregating the values with concat() which will sort the various values and a distinct-statement will exclude the duplicates.

In the case that the field-groups should be independent to eachother you might need several crosstables which are joined together. Further are there fields with different values which should not be considered from this logic - the distinct approach wouldn't work and you need to apply it in a separate logic and linking it back to your origin table.

- Marcus

shekhar_analyti
Specialist
Specialist
Author

Thank You Marcus

I was able to do it , But not sure if below is best way .

DATA_WITH_DUP:

LOAD * ,
AutoNumberHash256(SAMPLE,Habitat) as K1 ,
AutoNumberHash256(SAMPLE,Habitat,P1, P2, P3, P4, P5, C1, C2, C3, C4, C5) as K2 ;


LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

CT:
CrossTable(EMPNBR,EMPNAME,1)
LOAD
K1,
P1,
P2,
P3,
P4,
P5
Resident DATA_WITH_DUP ;
DROP Field EMPNBR ;
// DROP Table DATA_WITH_DUP ;

CT1:
CrossTable(CLENTNBR,CLIENT,1)
LOAD
K1,
C1, C2, C3, C4, C5
Resident DATA_WITH_DUP ;
DROP Field CLENTNBR ;


Left Join(DATA_WITH_DUP)
LOAD
K1,
Concat(DISTINCT EMPNAME,',') AS NAMES
Resident CT Group By K1 ;

Left Join(DATA_WITH_DUP)
LOAD
K1,
Concat(DISTINCT CLIENT,',') AS CLIENT_F
Resident CT1 Group By K1 ;

FINAL_QVD:
LOAD * ,
If(Left(CLIENT_F_O,1)=',',Right(CLIENT_F_O,Len(CLIENT_F_O)-1),CLIENT_F_O) as CLIENT_FINAL_LIST ,
If(Left(NAMES_O,1)=',',Right(NAMES_O,Len(NAMES_O)-1),NAMES_O) as NAMES_FINAL_LIST ;
LOAD
SAMPLE ,
//SAMPLE as SAMPLE_O,
Habitat AS Habitat_O, NAMES as NAMES_O , CLIENT_F as CLIENT_F_O
Resident DATA_WITH_DUP ;
marcus_sommer

I assume that regardless how this task will be solved it will be remain a rather ugly job ...

An alternatively to the concat() and distinct approach on the (joined) crosstables could be to use interrecord-functions like Peek-or-Previous within a sorted resident load to check the values from the other records and to flag them appropriate. And maybe you keep then all records and used later the flag within a listbox or in set analysis to exclude them.

- Marcus

shekhar_analyti
Specialist
Specialist
Author

I think it did not work the moment i added another row 

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S1, H1, A, , , , , XY, , AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

@sunny_talwar  .... Hi Sunny Bhai , please suggest any alternate solution  

 

 

shekhar_analyti
Specialist
Specialist
Author

Did not work with

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S1, H1, A, , , , , XY, , AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];