Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
jiwaniakbiar
Contributor II
Contributor II

Identify duplicates and mark them during load script

This is the table I am working with (Correct Record Flag) is not included in the table. I want to add this during load script.

I want to implement logic to mark Correct Record Flag when there are more than one rows with same passcode

1. If there is only one row with the passcode, mark this flag as "Y" (example: zz2832s, tu28cqc)

2. If there are more than one rows then following are the scenarios

- If there are two rows -One with Status 'Partial' and other 'Complete' then mark row with 'Complete' as "Y" and "Partial" as "N" (example: 55fez86)

- If there are two rows both with Same status i.e. both Partial or both Complete, then mark one with lastest based on Date Submitted  (Example: 9928cos)

- if there are 3 or more rows with combination of status, then pick the latest "Complete" and markt it as "Y" (Example: 1zr5mjc)

Response IDTime StartedDate SubmittedStatusPasscodeCorrect Record Flag
120/02/2019 11:0420/02/2019 21:34Partial1zr5mjcN
220/02/2019 11:5120/02/2019 22:22Complete1zr5mjcY
320/02/2019 21:0021/02/2019 7:33Partial1zr5mjcN
421/02/2019 0:5621/02/2019 11:27Partial55fez86N
521/02/2019 0:5521/02/2019 11:34Complete55fez86N
621/02/2019 3:0421/02/2019 13:35Complete55fez86Y
720/02/2019 11:5320/02/2019 22:24Partial9928cosN
820/02/2019 11:5320/02/2019 23:24Partial9928cosY
920/02/2019 11:5321/02/2019 21:24Partialtu28cqcY
1020/02/2019 11:5322/02/2019 23:34Completezz2832sY

 

Labels (2)
1 Reply
felipedl
Partner - Specialist III
Partner - Specialist III

Hi,

 

It's been a while since i've posted something here, so i'm a little rusty.

Attached the sample app, spreadsheet used and code that is used to generate your outcome

 

tmp:
load
Passcode,
Status,
[Date Submitted]
Resident [Planilha1];



tmp_count_pass:
Load
Passcode,
count(Passcode) as Count
Resident tmp
group by
Passcode;

tmp_count_status:
Load
Passcode,
Status,
count(Status) as Count_Status
Resident tmp
group by
Passcode,
Status;

tmp_x:
load
Distinct
Passcode
Resident
tmp;

left join (tmp_x)
load
Passcode,
Count_Status as 'Partial'
Resident
tmp_count_status
where
Status = 'Partial';

left join (tmp_x)
load
Passcode,
Count_Status as 'Complete'
Resident
tmp_count_status
where
Status = 'Complete';

left join (tmp_x)
load
Passcode,
Count as [Row Count]
Resident tmp_count_pass;

left join (tmp_x)
Load
Passcode,
max([Date Submitted]) as [Max Status]
Resident
[Planilha1]
group by
Passcode;

left join (tmp_x)
Load
Passcode,
max([Date Submitted]) as [Max Complete]
Resident
[Planilha1]
where
Status = 'Complete'
group by
Passcode;



tmp_x_2:
Load
Passcode,
If
(
[Row Count] = 1,
alt([Max Status],[Max Complete]),
if
(
Test = 'Mark Latest Status',
[Max Status],
if
(
Test = 'Mark Latest Complete',
[Max Complete]
)
)
) as [Date For Join];
load
*,
If
(
[Row Count] = 1,
'Mark Row',
if
(
Partial = 2 and isnull(Complete)
or
Complete = 2 and isnull(Partial),
'Mark Latest Status',
'Mark Latest Complete'
)
) as [Test]

Resident
tmp_x;

left join ([Planilha1])
load
Passcode,
[Date For Join] as [Date Submitted],
'Y' as [Correct Record Flag]
Resident tmp_x_2;

drop table tmp_x_2;

drop tables tmp_x, tmp_count_pass,tmp_count_status,tmp;

NoConcatenate
Final_Data:
Load
[Response ID],
[Time Started],
[Date Submitted],
Status,
Passcode,
if (IsNull([Correct Record Flag]),'N',[Correct Record Flag]) as [Correct Record Flag]
Resident [Planilha1];

drop table [Planilha1];

 

Basically what I did were some checks to count how many lines are per passcode (field [Row Count]) and after that the count per passcode and status (to get the correct 'Partial' and 'Complete' status).

Last, i classify if it should be getting the last date per status or (when the lines are doubles and same status) or if should get the latest complete date.

Hope it helps,

 

Felipe.

 

Result looks like yours:

 

sample.png