Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

reasons for duplicate rows

Hi, why are there duplicate rows? I concatenated tables from different sources and 'itemnum' is primary key. The itemnum appears in all the tables and has data for all columns but still duplicates, for example, it has data for all the columns from one table but is null for the other columns in other tables. So there is duplicate like below:

itemnumsitenamebalanceclasscodeGroup
1009AD-B--
1009--45-FGH

How can I combine everything to one row (distinct itemnum)?

itemnumsitenamebalanceclasscodeGroup
1009AD45BFGH

Thanks.

24 Replies
Not applicable
Author

What's the script to join on primary key? cos all I did was outer join(tablename)

Not applicable
Author

ItemClassification:

LOAD *, text(ITEMNUM) as itemnum;

LOAD ITEMNUM, .

.

.

.(other fields)

.

.

.

FROM

<filename>

(biff, embedded labels, table is Sheet1$);

outer Join(ItemClassification)

LOAD *, ltrim(text(ITEMNUM)) as itemnum;

LOAD ITEMNUM, .

.

.

.

.

.

FROM

<filename>

(biff, embedded labels, table is Sheet1$);

outer Join(ItemClassification)

LOAD *, ltrim(text(ITEMNUM)) as itemnum;

LOAD ITEMNUM,.

.

.

.

.

.

FROM

<filename>

(ooxml, embedded labels, table is (filename));

manojkulkarni
Partner - Specialist II
Partner - Specialist II

Outer join (Table), shouldn't be giving u duplicate rows.

Can u paste sample data

Anonymous
Not applicable
Author

Just a guess

Try ltrim on first table as well and let me know if that causes the issue

Not applicable
Author

Is there another way to match the itemnum?

swuehl
MVP
MVP

Sure,

if you can't fix your original join, you can use a resident load with a group by:

LOAD

     itemnum,

     only(site),

     only(name),

     only(balance),

     only(class),

     only(code),

     only(Group)

RESIDENT YourTable

GROUP BY itemnum;

edit: I think your issue is that primary keys itemnum don't match properly, check the values returned on length, existing white spaces etc. Your data should return numbers for the primary keys, right?

Not applicable
Author

but how do I use this for multiple tables? the fields come from different tables.

manojkulkarni
Partner - Specialist II
Partner - Specialist II

check below script

T1:

LOAD * INLINE [

    No, site, name, class

    1, A, XYZ, D

    2, B, ABC, A

    3, A, PQR, A

];

Outer join (T1)

LOAD * INLINE [

    No, Code, Grp

    1, 1a, AA

    2, 1b, ABC, DD

    3, 2a, PQR, AB

];

Not applicable
Author

I convert to text because some itemnum have letters

Not applicable
Author

I can't, there are 10,000 over records...