How to create "logical duplicates" free QVD from g... - Qlik Community

shekhar_analyti · ‎2019-03-04

Hi All ,

How to create "logical duplicates" free QVD from given Data , even when it appears no duplicate ?

Example : In Input data , Row 1 and 2 are logically same data , except for fact that names of same person coming in different field . Also data elements in C1,C2,C3,C4 are same but in different columns .

Requirement is to load only one of such data , which means we can load ROW 1 skipping ROW 2

INPUT

OUTPUT QVD : free from data which are strike out

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

Thanks & Regards

Shekar

marcus_malinow · ‎2019-03-04

Interesting little challenge. Here's my solution, though interested to see if there are more elegant ways of achieving the same result...

LOAD
SAMPLE, Habitat,
subfield(P, ',', 1) as P1,
subfield(P, ',', 2) as P2,
subfield(P, ',', 3) as P3,
subfield(P, ',', 4) as P4,
subfield(P, ',', 5) as P5,
subfield(C, ',', 1) as C1,
subfield(C, ',', 2) as C2,
subfield(C, ',', 3) as C3,
subfield(C, ',', 4) as C4,
subfield(C, ',', 5) as C5
;
LOAD
SAMPLE, Habitat, concat(DISTINCT P, ',') as P , concat(DISTINCT C, ',') as C
GROUP BY SAMPLE, Habitat
;
LOAD
SAMPLE, Habitat, subfield(P, ',') as P, subfield(C, ',') as C
;
LOAD
SAMPLE, Habitat,
P1 & if(len(P2) > 0, ',' & P2 & if(len(P3) > 0, ',' & P3 & if(len(P4) > 0, ',' & P4 & if(len(P5) > 0, ',' & P5)))) as P,
C1 & if(len(C2) > 0, ',' & C2 & if(len(C3) > 0, ',' & C3 & if(len(C4) > 0, ',' & C4 & if(len(C5) > 0, ',' & C5)))) as C
;
LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

marcus_sommer · ‎2019-03-05

I think I would go with the trial to load the data with a crosstable-statement, then aggregating the values with concat() which will sort the various values and a distinct-statement will exclude the duplicates.

In the case that the field-groups should be independent to eachother you might need several crosstables which are joined together. Further are there fields with different values which should not be considered from this logic - the distinct approach wouldn't work and you need to apply it in a separate logic and linking it back to your origin table.

- Marcus

shekhar_analyti · ‎2019-03-05

Thank You Marcus

I was able to do it , But not sure if below is best way .

DATA_WITH_DUP:

LOAD * ,
AutoNumberHash256(SAMPLE,Habitat) as K1 ,
AutoNumberHash256(SAMPLE,Habitat,P1, P2, P3, P4, P5, C1, C2, C3, C4, C5) as K2 ;

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

CT:
CrossTable(EMPNBR,EMPNAME,1)
LOAD
K1,
P1,
P2,
P3,
P4,
P5
Resident DATA_WITH_DUP ;
DROP Field EMPNBR ;
// DROP Table DATA_WITH_DUP ;

CT1:
CrossTable(CLENTNBR,CLIENT,1)
LOAD
K1,
C1, C2, C3, C4, C5
Resident DATA_WITH_DUP ;
DROP Field CLENTNBR ;

Left Join(DATA_WITH_DUP)
LOAD
K1,
Concat(DISTINCT EMPNAME,',') AS NAMES
Resident CT Group By K1 ;

Left Join(DATA_WITH_DUP)
LOAD
K1,
Concat(DISTINCT CLIENT,',') AS CLIENT_F
Resident CT1 Group By K1 ;

FINAL_QVD:
LOAD * ,
If(Left(CLIENT_F_O,1)=',',Right(CLIENT_F_O,Len(CLIENT_F_O)-1),CLIENT_F_O) as CLIENT_FINAL_LIST ,
If(Left(NAMES_O,1)=',',Right(NAMES_O,Len(NAMES_O)-1),NAMES_O) as NAMES_FINAL_LIST ;
LOAD
SAMPLE ,
//SAMPLE as SAMPLE_O,
Habitat AS Habitat_O, NAMES as NAMES_O , CLIENT_F as CLIENT_F_O
Resident DATA_WITH_DUP ;

marcus_sommer · ‎2019-03-05

I assume that regardless how this task will be solved it will be remain a rather ugly job ...

An alternatively to the concat() and distinct approach on the (joined) crosstables could be to use interrecord-functions like Peek-or-Previous within a sorted resident load to check the values from the other records and to flag them appropriate. And maybe you keep then all records and used later the flag within a listbox or in set analysis to exclude them.

- Marcus

shekhar_analyti · ‎2019-03-05

I think it did not work the moment i added another row

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S1, H1, A, , , , , XY, , AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];

@sunny_talwar .... Hi Sunny Bhai , please suggest any alternate solution

shekhar_analyti · ‎2019-03-05

Did not work with

LOAD * INLINE [

SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S1, H1, A, , , , , XY, , AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];