Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everyone,
I need help in avoiding some duplicate rows in the data,while loading them.
I have to make sure that no two row should have same data for all the columns against one field.
Let me explain,
suppose there are 10 fields and lets say number is one of those and we have some around 50000 rows.
Now I have to make sure that for number there should not be any row which has all 9 fields same.
i don't understand how to do it.
pleaaase help..\
Thanks..!!
You would need to find someway to combine them together into one otherwise checking the repetitions might not be easy.
Try something like this:
Table:
LOAD Number,
Field2,
Field3,
Field4,
Field5,
Field6,
Field7,
Field8,
Field9,
Field10,
AutoNumber(Field2&'|'&Field3&'|'&Field4&'|'&Field5&'|'&Field6&'|'&Field7&'|'&Field8&'|'&Field9&'|'&Field10) as Check
FROM Source
Right Join (Table)
LOAD Check,
FirstValue(Number) as FirstNumber
Resident Table
Group By Table;
The above will save the first instance of number in any repeating data.
and if there are >= 2 rows with the same n fields?
you want to keep one of them? exclude all?
example with 3 fields, adapt to 9
orange = result
keep the first row
X:
load *, f1 & '-' & f2 & '-' & f3 as fcheck inline [
number, f1, f2, f3
1, z, z, z
2, a, a, b
2, a, a, c
3, a, b, c
3, a, b, c
3, d, e, f
];
Y:
NoConcatenate load * Resident X
where number<>Peek('number') or number=Peek('number') and fcheck <> Peek('fcheck')
order by number, fcheck;
DROP Table X;
discard all duplicated rows
X:
load *, f1 & '-' & f2 & '-' & f3 as fcheck inline [
number, f1, f2, f3
1, z, z, z
2, a, a, b
2, a, a, c
3, a, b, c
3, a, b, c
3, d, e, f
];
Right Keep (X)
load * where cnt1 = 1;
load number, f3, count(f3) as cnt1
Resident X
group by number, f3;
Hello @anita sain
Please give an example.
Regards,
exclude one of those.
That's a really good way to keep a check on the fields but what if I have a large number of fields say 100,
??
Hi, use the keyword "DISTINCT" and RowNo() function.
I.E.
TABLE1:
LOAD DISTINCT
FIELD_01,
FIELD_02,
FEILD_03,
...
FIELD_09
;
with this keyword all rows with the same values on all fields are reduced to one row and the table are load with no duplicate rows.
TABLE2:
LOAD
RowNo() AS NUMBER,
*
RESIDENT TABLE1;
DROP TABLE1;
with RowNo() you can enumerate the rows in a table.
You would need to find someway to combine them together into one otherwise checking the repetitions might not be easy.
agree with sunny, to automate the field concatenation you can use
let vFieldCount = NoOfFields('TblName');
FOR i=1 to $(vFieldCount)
IF i=1 then
LET vFields = FieldName(i,'TblName');
ELSE
LET vFields = '$(vFields)' &' | '& FieldName(i,'TblName');
ENDIF
NEXT;