Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All ,
How to create "logical duplicates" free QVD from given Data , even when it appears no duplicate ?
Example : In Input data , Row 1 and 2 are logically same data , except for fact that names of same person coming in different field . Also data elements in C1,C2,C3,C4 are same but in different columns .
Requirement is to load only one of such data , which means we can load ROW 1 skipping ROW 2
INPUT
OUTPUT QVD : free from data which are strike out
LOAD * INLINE [
SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];
Thanks & Regards
Shekar
Interesting little challenge. Here's my solution, though interested to see if there are more elegant ways of achieving the same result...
LOAD
SAMPLE, Habitat,
subfield(P, ',', 1) as P1,
subfield(P, ',', 2) as P2,
subfield(P, ',', 3) as P3,
subfield(P, ',', 4) as P4,
subfield(P, ',', 5) as P5,
subfield(C, ',', 1) as C1,
subfield(C, ',', 2) as C2,
subfield(C, ',', 3) as C3,
subfield(C, ',', 4) as C4,
subfield(C, ',', 5) as C5
;
LOAD
SAMPLE, Habitat, concat(DISTINCT P, ',') as P , concat(DISTINCT C, ',') as C
GROUP BY SAMPLE, Habitat
;
LOAD
SAMPLE, Habitat, subfield(P, ',') as P, subfield(C, ',') as C
;
LOAD
SAMPLE, Habitat,
P1 & if(len(P2) > 0, ',' & P2 & if(len(P3) > 0, ',' & P3 & if(len(P4) > 0, ',' & P4 & if(len(P5) > 0, ',' & P5)))) as P,
C1 & if(len(C2) > 0, ',' & C2 & if(len(C3) > 0, ',' & C3 & if(len(C4) > 0, ',' & C4 & if(len(C5) > 0, ',' & C5)))) as C
;
LOAD * INLINE [
SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];
I think I would go with the trial to load the data with a crosstable-statement, then aggregating the values with concat() which will sort the various values and a distinct-statement will exclude the duplicates.
In the case that the field-groups should be independent to eachother you might need several crosstables which are joined together. Further are there fields with different values which should not be considered from this logic - the distinct approach wouldn't work and you need to apply it in a separate logic and linking it back to your origin table.
- Marcus
I assume that regardless how this task will be solved it will be remain a rather ugly job ...
An alternatively to the concat() and distinct approach on the (joined) crosstables could be to use interrecord-functions like Peek-or-Previous within a sorted resident load to check the values from the other records and to flag them appropriate. And maybe you keep then all records and used later the flag within a listbox or in set analysis to exclude them.
- Marcus
I think it did not work the moment i added another row
LOAD * INLINE [
SAMPLE, Habitat, P1, P2, P3, P4, P5, C1, C2, C3, C4, C5
S1, H1, A, B, C, , , AX, XY, ZA, ,
S1, H1, A, C, B, , , XY, ZA, AX, ,
S1, H1, A, , , , , XY, , AX, ,
S2, H2, A, C, B, , , XY, ZA, , ,
S3, H3, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, P, W, Q, C, R, BB, BC, BF, BL, LO
S4, H4, C, R, P, W, Q, BB, BC, BF, BL, LO
S4, H5, C, R, P, W, Q, BB, BC, BF, BL, LO
S5, H1, A, B, C, , , AX, XY, ZA, ,
S6, H1, A, B, C, , , AX, XY, ZA, ,
];
@sunny_talwar .... Hi Sunny Bhai , please suggest any alternate solution