Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear all,
I'm fairly new to Qlikview so apologies if this seems a simple question:
Below is some Data (Just 2 rows for an example) , the last column being my current result from the code below
PolicyNumber | ReportDate | PolicyVersionNumber | DuplicateKey | Result Currently for RemoveDuplicate |
AB123 | 01/03/2013 | 3 | AB123 01/03/2013 | 1 |
AB123 | 01/03/2013 | 4 | AB123 01/03/2013 |
This code identifies the minimum PolicyVersionNumber by the DuplicateKey field and then creates a table containing
a RemoveDuplicate flag with value of 1. In reality There are thousands of rows of data and the small data set only contains about
60 rows of data.
However I have been trying to modify the code so the small table contains a "0" to indicate it is not a duplicate. Meaning there would 100% match between the subset table and the main policy table when matching on the PolicyKey
Note: Policykey is just the PolicyNumber and the PolicyVersion Number.
Code so far:
DUPLICATES:
LOAD
DuplicateKey,
PolicyNumber & ' ' & MIN(PolicyVersionNumber) as PolicyKey,
1 AS RemoveDuplicate
RESIDENT POLICIES
GROUP BY DuplicateKey, PolicyNumber;
/* Count the number of duplicates by the DuplicateKey */
INNER JOIN (DUPLICATES)
LOAD *
WHERE DuplicateCount>1;
LOAD
DuplicateKey,
count(DuplicateKey) as DuplicateCount
RESIDENT POLICIES
WHERE MATCH(PolicyVersionStatusId,3,5,6)
GROUP BY DuplicateKey;
//*** Drop Temporary fields
DROP FIELDS DuplicateKey, DuplicateCount;
Required Result:
PolicyNumber | ReportDate | PolicyVersionNumber | DuplicateKey | Result Required for RemoveDuplicate |
AB123 | 01/03/2013 | 3 | AB123 01/03/2013 | 1 |
AB123 | 01/03/2013 | 4 | AB123 01/03/2013 | 0 |
What about something like the following (both things that I've added/changed are commented with ***NS):
DUPLICATES:
LOAD
DuplicateKey,
PolicyNumber & ' ' & MIN(PolicyVersionNumber) as PolicyKey,
1 AS RemoveDuplicateTemp //***NS - renamed this to RemoveDuplicateTemp
RESIDENT POLICIES
GROUP BY DuplicateKey, PolicyNumber;
/* Count the number of duplicates by the DuplicateKey */
INNER JOIN (DUPLICATES)
LOAD *
WHERE DuplicateCount>1;
LOAD
DuplicateKey,
count(DuplicateKey) as DuplicateCount
RESIDENT POLICIES
WHERE MATCH(PolicyVersionStatusId,3,5,6)
GROUP BY DuplicateKey;
//*** Drop Temporary fields
DROP FIELDS DuplicateKey, DuplicateCount;
///***NS - Add this to replace anything that isn't a one with a zero, then get rid of original field
LEFT JOIN (DUPLICATES)
LOAD *,
if(RemoveDuplicateTemp=1, 1, 0) as RemoveDuplicate
RESIDENT DUPLICATES;
DROP FIELD RemoveDuplicateTemp;
This doesn't work as the duplicates table you are creating the resident from only contains the 60 rows;
I have manged to do it by creating another step , but it is not as slick as I think it could be!
I create a mapping table then -map this on , and where there is no match assign a 0. It works but having to create a another table and drop my original policies table seems like over processing/ bigger refresh times.
DTEMP:
LOAD
DuplicateKey,
PolicyNumber & ' ' & MIN(PolicyVersionNumber) as PolicyKey,
1 AS RemoveDuplicate
RESIDENT POLICIES
GROUP BY DuplicateKey, PolicyNumber;
/* Count the number of duplicates by the DuplicateKey */
INNER JOIN (DTEMP)
LOAD *
WHERE DuplicateCount>1;
LOAD
DuplicateKey,
count(DuplicateKey) as DuplicateCount
RESIDENT POLICIES
WHERE MATCH(PolicyVersionStatusId,3,5,6)
GROUP BY DuplicateKey;
DUPLICATES:
MAPPING LOAD
PolicyKey,
RemoveDuplicate
RESIDENT DTEMP;
//*** Drop Temporary fields
DROP FIELDS DuplicateKey, DuplicateCount;
DROP TABLE DTEMP;
FACTS:
LOAD
*,
APPLYMAP('DUPLICATES',PolicyKey,0) AS RemoveDuplicate
RESIDENT POLICIES;
DROP TABLE POLICIES;