Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
dhanu_today
Creator
Creator

Identifying Duplicate values and result

Hi Everyone,

I have data like below and output need show as below.

I have to show the EDIT as 1 for the ID’s which has more than 1 resolved.

e.g. A101 has 2 resolved so at 22/12/2020 (i.e. Max resolved) need to show Edit as 1 and QA as 0 and Info as 0

A103 has 2 resolved so at 29/12/2020 (i.e. Max resolved) need to show Edit as 1 and QA as 0 and Info as 0

Data:

load * Inline
[
ID, Opendate, Resolved, QA, Info
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 0, 0
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 0, 0
A104, 24/12/2020, 25/12/2020, 2 4

];

Output:

ID

Opendate

Resolved

QA

Info

Edit

A101

01/12/2020

03/12/2020

1

2

0

A102

01/12/2020

04/12/2020

1

3

0

A101

01/12/2020

22/12/2020

0

0

1

A103

23/12/2020

25/12/2020

1

5

0

A103

23/12/2020

29/12/2020

0

0

1

A104

24/12/2020

25/12/2020

2

4

0

 

Thanks,

Dhanu

1 Solution

Accepted Solutions
vhayward
Partner - Contributor III
Partner - Contributor III

Hej,

I have created this script which guides you through all the steps to finally get the output as desired in your requirements. In the same time you can easily adjust the scripts when the requirements are changing.

So the emphasis is not do the solution in the least amount of lines possible, but providing a solution that can be read and understood easily by any developer from unexperienced to very experienced.

// Starting Data Set
TMP_DataSet:
LOAD * INLINE [
ID, Opendate, Resolved, QA, Info
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 1, 2
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 1, 5
A104, 24/12/2020, 25/12/2020, 2 ,4
];

// Determine the max. resolved per unique ID.
MAP_MaxResolved:
MAPPING LOAD DISTINCT
[ID] AS [ID]
, MAXSTRING([Resolved]) AS [Max Resolved]
RESIDENT TMP_DataSet
GROUP BY [ID];

// Determine the amount of instances per unique ID
MAP_CountResolved:
MAPPING LOAD DISTINCT
[ID] AS [ID]
, COUNT([Resolved]) AS [Count Resolved]
RESIDENT TMP_DataSet
GROUP BY [ID];

// Add the max resolved and the count resolved to the data set
ENRICHED_DataSet:
NOCONCATENATE LOAD
[ID] AS [ID]
, [Opendate] AS [Opendate]
, [Resolved] AS [Resolved]
, [QA] AS [QA]
, [Info] AS [Info]
, APPLYMAP('MAP_MaxResolved', [ID], NULL()) AS [Max Resolved]
, APPLYMAP('MAP_CountResolved', [ID], 1) AS [Count Resolved]
RESIDENT TMP_DataSet;
DROP TABLE TMP_DataSet;

// Set the requirements to get the proper values at the QA, Info and Edit fields.
FINAL_DataSet:
NOCONCATENATE LOAD
[ID] AS [ID]
, [Opendate] AS [Opendate]
, [Resolved] AS [Resolved]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 0, [QA]) AS [QA]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 0, [Info]) AS [Info]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 1, 0) AS [Edit]
RESIDENT ENRICHED_DataSet;
DROP TABLE ENRICHED_DataSet;

 

Output:

vhayward_0-1611171387435.png

Best regards,
Vincent Hayward

View solution in original post

7 Replies
dhanu_today
Creator
Creator
Author

Updated:

Hi Everyone,

I have data like below and output need show as below.

I have to show the EDIT as 1 for the ID’s which has more than 1 resolved.

e.g. A101 has 2 resolved so at 22/12/2020 (i.e. Max resolved) need to show Edit as 1 and QA as 0 and Info as 0

A103 has 2 resolved so at 29/12/2020 (i.e. Max resolved) need to show Edit as 1 and QA as 0 and Info as 0

Data:

load * Inline
[
ID, Opendate, Resolved, QA, Info
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 1, 2
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 1, 5
A104, 24/12/2020, 25/12/2020, 2 ,4

];

Output:

ID

Opendate

Resolved

QA

Info

Edit

A101

01/12/2020

03/12/2020

1

2

0

A102

01/12/2020

04/12/2020

1

3

0

A101

01/12/2020

22/12/2020

0

0

1

A103

23/12/2020

25/12/2020

1

5

0

A103

23/12/2020

29/12/2020

0

0

1

A104

24/12/2020

25/12/2020

2

4

0

 

Thanks,

Dhanu

Ksrinivasan
Specialist
Specialist

Hi, 

In edit coloumn use count if function. Like count if I'd>1, then return 1 or 0.

Then you have to recreate QA and Info coloum with if condition. That is if edit coloumn is =1, return 0 or return exist value.

Regards,

Ksrinivasan

dhanu_today
Creator
Creator
Author

Hi,

can someone provide some clue on this for solution.

 

TIA

Dhanu

Ksrinivasan
Specialist
Specialist

hi,

find the script it will fulfill your need,

Data:
load
*,
if(AutoNumber(Rowno(),ID)='2','1','0') as EDIT;
load
* Inline
[
ID, Opendate, Resolved, QA, Info,
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 1, 2
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 1, 5
A104, 24/12/2020, 25/12/2020, 2 ,4

];

result:

Ksrinivasan_0-1611160353165.png

 

ksrinivasan

 

vhayward
Partner - Contributor III
Partner - Contributor III

Hej,

I have created this script which guides you through all the steps to finally get the output as desired in your requirements. In the same time you can easily adjust the scripts when the requirements are changing.

So the emphasis is not do the solution in the least amount of lines possible, but providing a solution that can be read and understood easily by any developer from unexperienced to very experienced.

// Starting Data Set
TMP_DataSet:
LOAD * INLINE [
ID, Opendate, Resolved, QA, Info
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 1, 2
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 1, 5
A104, 24/12/2020, 25/12/2020, 2 ,4
];

// Determine the max. resolved per unique ID.
MAP_MaxResolved:
MAPPING LOAD DISTINCT
[ID] AS [ID]
, MAXSTRING([Resolved]) AS [Max Resolved]
RESIDENT TMP_DataSet
GROUP BY [ID];

// Determine the amount of instances per unique ID
MAP_CountResolved:
MAPPING LOAD DISTINCT
[ID] AS [ID]
, COUNT([Resolved]) AS [Count Resolved]
RESIDENT TMP_DataSet
GROUP BY [ID];

// Add the max resolved and the count resolved to the data set
ENRICHED_DataSet:
NOCONCATENATE LOAD
[ID] AS [ID]
, [Opendate] AS [Opendate]
, [Resolved] AS [Resolved]
, [QA] AS [QA]
, [Info] AS [Info]
, APPLYMAP('MAP_MaxResolved', [ID], NULL()) AS [Max Resolved]
, APPLYMAP('MAP_CountResolved', [ID], 1) AS [Count Resolved]
RESIDENT TMP_DataSet;
DROP TABLE TMP_DataSet;

// Set the requirements to get the proper values at the QA, Info and Edit fields.
FINAL_DataSet:
NOCONCATENATE LOAD
[ID] AS [ID]
, [Opendate] AS [Opendate]
, [Resolved] AS [Resolved]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 0, [QA]) AS [QA]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 0, [Info]) AS [Info]
, IF([Count Resolved] > 1 AND [Resolved] = [Max Resolved], 1, 0) AS [Edit]
RESIDENT ENRICHED_DataSet;
DROP TABLE ENRICHED_DataSet;

 

Output:

vhayward_0-1611171387435.png

Best regards,
Vincent Hayward

Saravanan_Desingh

One solution is.

tab1:
LOAD * INLINE [
ID, Opendate, Resolved, QA, Info
A101, 01/12/2020, 03/12/2020, 1, 2
A102, 01/12/2020, 04/12/2020, 1, 3
A101, 01/12/2020, 22/12/2020, 1, 2
A103, 23/12/2020, 25/12/2020, 1, 5
A103, 23/12/2020, 29/12/2020, 1, 5
A104, 24/12/2020, 25/12/2020, 2 ,4
];

Left Join(tab1)
LOAD ID, Date(Max(Date#(Resolved,'DD/MM/YYYY'))) As MaxDt,
	Count(ID) As Cnt
Resident tab1
Group By ID;

tab2:
LOAD ID, Opendate, Resolved, If(Cnt>1 And Date#(Resolved,'DD/MM/YYYY')=MaxDt,0,QA) As QA,
	If(Cnt>1 And Date#(Resolved,'DD/MM/YYYY')=MaxDt,0,Info) As Info,
	If(Cnt>1 And Date#(Resolved,'DD/MM/YYYY')=MaxDt,1,0) As Edit
Resident tab1;

Drop Table tab1;
Saravanan_Desingh

Output:

commqv82.PNG