Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
lfalmoguera
Creator
Creator

Store duplicated rows into another table

Hi all, 

I have a quick question in order to see if I can store duplicated values into a single file in order to deal with them afterwards.

Imagine I have this TABLE_EXAMPLE (my real table is much bigger with over +200 fields):

field1field2field3field4
1111
1111
2222
3333

 

Then I script

Load * distinct Resident TABLE_EXAMPLE;

and will get 

field1field2field3field4
1111
2222
3333

 

Is there quick way I can get the "erased row" into another table, to get:

TABLE_DUPLICATED_ROW:

field1field2field3field4
1111

 

Like a 

load * "duplicated" Resident  TABLE_EXAMPLE?

I want to avoid to "build" a 200 file-key.

Kind regards.

Labels (1)
1 Solution

Accepted Solutions
sunny_talwar

Give this a shot

TABLE_EXAMPLE:
LOAD * INLINE [
    field1, field2, field3, field4
    1, 1, 1, 1
    1, 1, 1, 1
    2, 2, 2, 2
    3, 3, 3, 3
];

TempTable:
CrossTable (Field, Value)
LOAD 1 as RowNum,
	 *
Resident TABLE_EXAMPLE
Where RecNo() = 1;

TempTable2:
LOAD Concat(DISTINCT '[' & Field & ']', '&') as ConcatField1,
	 Concat(DISTINCT '[' & Field & ']', ',') as ConcatField2
Resident TempTable;

LET vConcatField1 = Peek('ConcatField1');
TRACE $(vConcatField1);
LET vConcatField2 = Peek('ConcatField2');
TRACE $(vConcatField2);

DROP Tables TempTable, TempTable2;

TABLE_EXAMPLE2:
LOAD *,
	 If(Key = Previous(Key), RangeSum(Peek('KeyTemp'), 1), 1) as KeyTemp,
	 Hash128($(vConcatField1)&AutoNumber(If(Key = Previous(Key), RangeSum(Peek('KeyTemp'), 1), 1), $(vConcatField1))) as Key2;
LOAD *,
	 Hash128($(vConcatField1)) as Key	 
Resident TABLE_EXAMPLE
Order By $(vConcatField2);

DROP Table TABLE_EXAMPLE;
RENAME Table TABLE_EXAMPLE2 to TABLE_EXAMPLE;

Table:
LOAD DISTINCT $(vConcatField2),
	 Hash128($(vConcatField1)&1) as Key3
Resident TABLE_EXAMPLE;

Duplicate:
LOAD DISTINCT $(vConcatField2),
	 'Duplicate' as DuplicateFlag
Resident TABLE_EXAMPLE
Where not Exists(Key3, Key2);

DROP Table TABLE_EXAMPLE;
DROP Field Key3;

View solution in original post

8 Replies
DavidM
Partner - Creator II
Partner - Creator II

I think you need to add field and use Count(duplicatedfield) as count and group by some ID.

Then you can resident load table with condition Where(Count)>1

sunny_talwar

Without the key might not be possible, but you can dynamically create the key field like this

TABLE_EXAMPLE:
LOAD * INLINE [
    field1, field2, field3, field4
    1, 1, 1, 1
    1, 1, 1, 1
    2, 2, 2, 2
    3, 3, 3, 3
];

TempTable:
CrossTable (Field, Value)
LOAD 1 as RowNum,
	 *
Resident TABLE_EXAMPLE;

TempTable2:
LOAD Concat(DISTINCT '[' & Field & ']', '&') as ConcatField1,
	 Concat(DISTINCT '[' & Field & ']', ',') as ConcatField2
Resident TempTable;

LET vConcatField1 = Peek('ConcatField1');
TRACE $(vConcatField1);
LET vConcatField2 = Peek('ConcatField2');
TRACE $(vConcatField2);

DROP Tables TempTable, TempTable2;

Left Join (TABLE_EXAMPLE)
LOAD $(vConcatField2),
	 Count($(vConcatField)) as Count
Resident TABLE_EXAMPLE
Group By $(vConcatField2);

Table:
LOAD DISTINCT $(vConcatField2)
Resident TABLE_EXAMPLE;

Duplicate:
LOAD DISTINCT $(vConcatField2),
	 'Duplicate' as DuplicateFlag
Resident TABLE_EXAMPLE
Where Count > 1;

DROP Table TABLE_EXAMPLE;
lfalmoguera
Creator
Creator
Author

Yeah, that´s the approach.

But I wanted to know if there were any "out of the box" solution, like a "reverse distinct" 🙂

Seems it does not.

Thanks!

lfalmoguera
Creator
Creator
Author

Thanks a lot.

Just a slight ammend, just in case someone copy&paste your code.

There´s a missed 1 in line 3.

Left Join (TABLE_EXAMPLE)
LOAD $(vConcatField2),
	 Count($(vConcatField1)) as Count
Resident TABLE_EXAMPLE
Group By $(vConcatField2);

 

The code works fantastic and it´s automated, but with a +200 field per row and +10M file row, it makes the server go nuts : )

Anyhow, so much appreciated!

Any one with a "not su much process consuming " idea :' (

sunny_talwar

Yup, my bad for the typo....

I guess the number of fields might not be causing the issue, but the Group By is... we might be able to optimize this using Where Exist... let me play around with it.

sunny_talwar

Give this a shot

TABLE_EXAMPLE:
LOAD * INLINE [
    field1, field2, field3, field4
    1, 1, 1, 1
    1, 1, 1, 1
    2, 2, 2, 2
    3, 3, 3, 3
];

TempTable:
CrossTable (Field, Value)
LOAD 1 as RowNum,
	 *
Resident TABLE_EXAMPLE
Where RecNo() = 1;

TempTable2:
LOAD Concat(DISTINCT '[' & Field & ']', '&') as ConcatField1,
	 Concat(DISTINCT '[' & Field & ']', ',') as ConcatField2
Resident TempTable;

LET vConcatField1 = Peek('ConcatField1');
TRACE $(vConcatField1);
LET vConcatField2 = Peek('ConcatField2');
TRACE $(vConcatField2);

DROP Tables TempTable, TempTable2;

TABLE_EXAMPLE2:
LOAD *,
	 If(Key = Previous(Key), RangeSum(Peek('KeyTemp'), 1), 1) as KeyTemp,
	 Hash128($(vConcatField1)&AutoNumber(If(Key = Previous(Key), RangeSum(Peek('KeyTemp'), 1), 1), $(vConcatField1))) as Key2;
LOAD *,
	 Hash128($(vConcatField1)) as Key	 
Resident TABLE_EXAMPLE
Order By $(vConcatField2);

DROP Table TABLE_EXAMPLE;
RENAME Table TABLE_EXAMPLE2 to TABLE_EXAMPLE;

Table:
LOAD DISTINCT $(vConcatField2),
	 Hash128($(vConcatField1)&1) as Key3
Resident TABLE_EXAMPLE;

Duplicate:
LOAD DISTINCT $(vConcatField2),
	 'Duplicate' as DuplicateFlag
Resident TABLE_EXAMPLE
Where not Exists(Key3, Key2);

DROP Table TABLE_EXAMPLE;
DROP Field Key3;
Brett_Bleess
Former Employee
Former Employee

Luis, did Sunny's last post and example get you a good working solution?  If so, please be sure to come back and use the Accept as Solution button on that post to let others know that it worked and to give Sunny credit for the help.   If you have further questions, leave an update.

Regards,
Brett

To help users find verified answers, please do not forget to use the "Accept as Solution" button on any post(s) that helped you resolve your problem or question.
I now work a compressed schedule, Tuesday, Wednesday and Thursday, so those will be the days I will reply to any follow-up posts.
lfalmoguera
Creator
Creator
Author

Sorry Brett. My bad,

Solution accepted!