Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi there,
I am new to Qlik Sense and so just getting familiar with how to perform basic functions.
I am having trouble applying the IntervalMatch function to how I would have typically applied Xlookup functions in excel. I am trying to do lookups with Canadian Postal codes format (letternumberletter numberletternumber, eg A0A 1A0)
I currently have the two tables:
//this is the range table where there is an additional charge for extended region if the postal code falls within the Postal Code From and Postal Code To interval with Charge being the fee.
[Extended$]:
LOAD
[Postal Code From],
[Postal Code To],
[Charge]
// These are all the Canadian postal codes
[PostalCodeDatabase]:
LOAD
TRIM(REPLACE([PostalCode],' ','')) as [PostalCode],
[City],
[Province],
[AreaCode],
[Latitude],
[Longitude],
[CityMixedCase],
[RecordType],
GeoMakePoint([Latitude], [Longitude]) AS [Longitude_Latitude]
Note that not all PostalCodes in the PostalCodeDatabase will fall within a range on the Extended table. In that case the output would be $0 or "Not Found". Where a PostalCode fits within a range I'd like to map the Postal Code From (lower limit), Postal Code To (upper limit) and Charge to the PostalCodeDatabase or a bridge table. If there wasn't a match it can be "Not Found", "Not Found", 0.
I can do this pretty easily with XLookups and If then statements in excel, but can't quite figure out how to get the Xlookup function replicated.
Any help anyone can provide would be super appreciated!!
Thanks,
Julie
You did a lot of typing while I was trying to resolve your issue integrating with R; I wrote the R function but I was having difficulties integrating the formula with the Qlik Script; I am glad you worked it out in the script; I changed the formula a little bit in two areas, first avoiding typing those long columns name, and second fine-tuning the formula.
I aliased three new columns within the script:
The original columns remained as part of the solution while PCf, PCt and PC were used in the formula, with a lot of less typing; I did preceding load to introduce these columns.
The script for the PC_RangeRate looks like this:
PC_RangeRate:
Load
"Postal Code From",
"Postal Code To",
Charge,
Num(PC_From_Hex) As PC_From_Hex_Old,
Num(PC_To_Hex) As PC_To_Hex_Old,
Num(Text( Ord(Mid(PCf,1,1))*1000000 + Ord(Mid(PCf,2,1))*1000 + Ord(Mid(PCf,3,1)) ) &'.'&
Text(100000000 + Ord(Mid(PCf,4,1))*1000000 + Ord(Mid(PCf,5,1))*1000 + Ord(Mid(PCf,6,1)))) as PC_From_Hex,
Num(Text( Ord(Mid(PCt,1,1))*1000000 + Ord(Mid(PCt,2,1))*1000 + Ord(Mid(PCt,3,1)) ) &'.'&
Text(100000000 + Ord(Mid(PCt,4,1))*1000000 + Ord(Mid(PCt,5,1))*1000 + Ord(Mid(PCt,6,1)))) as PC_To_Hex
;
Load *,
"Postal Code From" As PCf,
"Postal Code To" As PCt
Resident PC_RangeRate_Temp;
The data was loaded to a temporary table: PC_RangeRate_Temp, here the columns PCf and PCt where introduced, the formula looks a little bit different, but still using the Ord() function but the Right() function is not the correct one to use, I replaced it with the Mid() function because we get the ordinal from a particular position in the PostCode (actually, as I write my reply I realize it should work with the Right() function, but it is too late on my side, I already replaced it)
Num(Text( Ord(Mid(PCf,1,1))*1000000 + Ord(Mid(PCf,2,1))*1000 + Ord(Mid(PCf,3,1)) ) &'.'&
Text(100000000 + Ord(Mid(PCf,4,1))*1000000 + Ord(Mid(PCf,5,1))*1000 + Ord(Mid(PCf,6,1)))) as PC_From_Hex
You will notice few things here:
There are two Text() functions concatenated with a dot (.) between them, that is a decimal point.
The first 3 characters of the PostCode go in the first Text() functions, the remaining 3 on the second Text()
The Ord() number of the first character of the PostCode is multiplied by 1000000 (one million), so the letter 'A' become 65000000; the second character is multiplied by 1000 (one thousand) , so the letter '0' become 48000, when we add these results, we get 65048000, here the ordinal of the first character is not added to the ordinal of the second character, finally the last character is taken as it is, so for a letter 'A' is once again 65, adding these three ordinals: 65000000 + 48000 + 65 we get the number 65048065; the reason for using a thousand factor is because the ordinal for some characters are over one hundred causing overlapping additions. The sames logic was applied with the last 3 characters of the postcode, with the difference of adding an initial 100000000 (one hundred millions) the reason is once again preventing unexpected results if a post code contain lower case characters, as their ORD number is usually over 100.
With the PC_DataBase the column PC was introduced for the same reason, making easier to code the long function, the remaining part of the script is here:
NoConcatenate
Temp_Table:
Load
PostCode_Real,
City,
Province,
"Area Code",
PostCode_Old,
Num(Text( Ord(Mid(PC,1,1))*1000000 + Ord(Mid(PC,2,1))*1000 + Ord(Mid(PC,3,1)) ) &'.'&
Text(100000000 + Ord(Mid(PC,4,1))*1000000 + Ord(Mid(PC,5,1))*1000 + Ord(Mid(PC,6,1)))) as PostCode,
;
LOAD
"PostCode" as PostCode_Real,
"PostCode" as PC,
City,
Province,
"Area Code",
Num(PC_Hex) As PostCode_Old
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join
IntervalMatch (PostCode) Load "PC_From_Hex", "PC_To_Hex" Resident PC_RangeRate;
PC_DataBase:
Load PostCode,
City,
Province,
"Area Code",
ApplyMap('Map_PC_RangeRate', "PC_From_Hex" & '|' & "PC_To_Hex" ) as RangeCharge,
"PC_From_Hex" as PC_From,
"PC_To_Hex" as PC_To,
ApplyMap('Map_PC_From', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_From_Real,
ApplyMap('Map_PC_To', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_To_Real,
PostCode_Real
Resident Temp_Table;
Drop Table Temp_Table;
Drop Table PC_RangeRate_Temp;
Your formula is lovely, I am glad you figured it out!
Best regards,
Hi Julie,
This is a very interesting question, indeed we have to implement IntervalMatch as well as ApplyMap in our load script to answer it; We mocked up a dummy Extend$ and PostalCodeDatabase tables to workout a solution! below is my load script:
PC_RangeRate:
LOAD
"Postal Code From",
"Postal Code To",
Charge
FROM [lib://PostalCodes/PC_RangeRate.txt]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
Load
Null() As "Postal Code From",
Null() As "Postal Code To",
0 As Charge
AutoGenerate(1);
Map_PC_RangeRate:
Mapping Load
"Postal Code From" & '|' & "Postal Code To",
Charge
Resident PC_RangeRate;
NoConcatenate
Temp_Table:
LOAD
"PostCode",
City,
Province,
"Area Code"
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join
IntervalMatch (PostCode) Load "Postal Code From", "Postal Code To" Resident PC_RangeRate;
PC_DataBase:
Load PostCode,
City,
Province,
"Area Code",
ApplyMap('Map_PC_RangeRate', "Postal Code From" & '|' & "Postal Code To" ) as RangeCharge,
"Postal Code From" as PC_From,
"Postal Code To" as PC_To
Resident Temp_Table;
Drop Table Temp_Table;
First we load the Extended$ table, I named PC_RangeRate in the script, concatenating an extra record with Null values for the "Postal Code From" and "Postal Code To" with a zero Charge. We need this record when handling records with a post code missing in the Extended$ table.
Next, we created a mapping table, Map_PC_RangeRate in the script, we will reference it later on the script.
Then, we load the PostalCodeDatabase as a Temp_Table, here We do a Left Join to the IntervalMatch statement on the PostCode column with the Extended$ (PC_RangeRate) resident table. The Temp_Table features the "Postal Code From" and "Postal Code To" columns, without the Charge column, that is the reason for the Map_RangeRate mapping table.
Next we create the PC_Database (your PostCodeDatabase) from the Resident Temp_Table, implementing the ApplyMap function as coded in the script, the resulting column as named RangeCharge to avoid the existing Charge column elsewhere. We also renamed "Postal Code From" and "Postal Code To" to PC_From and PC_To to avoid synthetic keys with the Extend$ table (the PC_RangeRate) in the script.
Note: You may very well exclude the "Postal Code From" and "Postal Code To" from the PostCodeDatabase (PC_Database) but I believe it could add value to your solution.
Finally the script drop the Temp_Table table; the screenshot below shows the results with my mocked data.
Regards,
Arnaldo Sandoval
No worries @InsightlyStudio, please once you are done reviewing the solution, reply with some feedback.
Regards,
Arnaldo Sandoval
I tried to implement your solution pretty closely since I am not quite there on understanding the Mapping tables and the reason for the left join 🙂 [Hopefully I'll get there]
My results didn't work out as it doesn't seem to be able to find a match and all Postal Codes came back with a "Charge" of 0. Is this perhaps since my Postal Codes are alphanumeric rather than integers or numeric?
Or perhaps I missed something in my load coding (I've copied it below)
PC_RangeRate:
LOAD
"Postal Code From",
"Postal Code To",
Charge
FROM [lib://DataFiles/UPS extended zone exercise .xls]
(biff, embedded labels, table is Extended$);
Load
Null() as "Postal Code From",
Null() as "Postal Code To",
0 As Charge
AutoGenerate(1);
Map_PC_RangeRate:
Mapping Load
"Postal Code From" & '|' & "Postal Code To",
Charge
Resident PC_RangeRate;
NoConcatenate
Temp_Table:
Load
TRIM(REPLACE("PostalCode",' ','')) as "PostalCode",
"City",
"Province",
"AreaCode",
"Latitude",
"Longitude",
"CityMixedCase",
"RecordType",
GeoMakePoint("Latitude", "Longitude") AS "Longitude_Latitude"
FROM [lib://DataFiles/PostalCodeDatabase.xlsx]
(ooxml, embedded labels, table is PostalCodeDatabase);
Left Join
IntervalMatch (PostalCode)
Load "Postal Code From","Postal Code To"
Resident PC_RangeRate;
PostalCodeDatabase:
Load
PostalCode,
City,
Province,
AreaCode,
Latitude,
Longitude,
CityMixedCase,
RecordType,
GeoMakePoint(Latitude, Longitude) AS Longitude_Latitude,
ApplyMap('Map_PC_RangeRate',"Postal Code From" & '|' & "Postal Code To") as RangeCharge,
"Postal Code From" as PC_From,
"Postal Code To" as PC_To
Resident Temp_Table;
Drop Table Temp_Table;
Actually I did not find anything wrong with the script, it works, the issue is with the MatchingInterval function/statement 😮; it is clearly written at its help page, shown below!
It works with "discrete numeric values" while Canadians Post Codes are alpha-numeric; I tried with some of your Canadian's post code replicating the same results you got 🙂; Nevertheless, I got it working with some mathematical trickery, which I will share with you on my next reply, while this reply focus on some actions you should take with your Qlik Vendor and perhaps somebody within the community could take over my advice to Qlik directly.
Actions:
Future: |
|
Immediate: |
|
I will add another reply with my workaround sometime soon, and explains the mathematical trickery and another Qlik scripting language limitation that make the trickery more trickier; It is like doing some witchcraft, you will see once I post my reply.
Best regards,
Hope this helps,
Arnaldo Sandoval
This is the workaround to implement the MatchingInterval for Canadians Post Codes.
The problem:
The Challenge:
The Idea:
=TEXT(CODE(MID(A2,1,1)), "000") & TEXT(CODE(MID(A2,2,1)), "000") & TEXT(CODE(MID(A2,3,1)), "000") & TEXT(CODE(MID(A2,4,1)), "000") & TEXT(CODE(MID(A2,5,1)), "000") & TEXT(CODE(MID(A2,6,1)), "000")
Now, this is the new script:
NoConcatenate
PC_RangeRate:
LOAD
"Postal Code From",
"Postal Code To",
Charge,
Num(PC_From_Hex) As PC_From_Hex,
Num(PC_To_Hex) As PC_To_Hex
FROM [lib://PostalCodes/PC_RangeRate.txt]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
Load
Null() As "Postal Code From",
Null() As "Postal Code To",
0 As Charge,
Null() As PC_From_Hex,
Null() As PC_To_Hex
AutoGenerate(1);
Map_PC_RangeRate:
Mapping Load
"PC_From_Hex" & '|' & "PC_To_Hex",
Charge
Resident PC_RangeRate;
Map_PC_From:
Mapping Load
"PC_From_Hex" & '|' & "PC_To_Hex",
"Postal Code From"
Resident PC_RangeRate;
Map_PC_To:
Mapping Load
"PC_From_Hex" & '|' & "PC_To_Hex",
"Postal Code To"
Resident PC_RangeRate;
NoConcatenate
Temp_Table:
LOAD
"PostCode" as PostCode_Real,
City,
Province,
"Area Code",
Num(PC_Hex) As PostCode
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join
IntervalMatch (PostCode) Load "PC_From_Hex", "PC_To_Hex" Resident PC_RangeRate;
PC_DataBase:
Load PostCode,
City,
Province,
"Area Code",
ApplyMap('Map_PC_RangeRate', "PC_From_Hex" & '|' & "PC_To_Hex" ) as RangeCharge,
"PC_From_Hex" as PC_From,
"PC_To_Hex" as PC_To,
ApplyMap('Map_PC_From', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_From_Real,
ApplyMap('Map_PC_To', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_To_Real,
PostCode_Real
Resident Temp_Table;
Drop Table Temp_Table;
Now these are the new screenshots:
The new Extended$ table with the new transformed PCs; you should not show them to the end-users, they are here to debug and ilustrate this solution.
This is the new Post Codes database, also showing the transformed post codes.
Hope these crazy ideas helps,
Regards,
Arnaldo Sandoval
I am so appreciative of the time you took to investigate and come up with a potential solution @ArnadoSandoval !
I'll give your suggestion a test with the larger/full data set of 800k+ postal codes and see what it comes back with.
On the suggestion side - do you find Qlik responsive to Qlik Product Insight & Ideas to be a good forum to raise those issues or are the ideas not broadly accepted?
Testing my proposed idea a test with 800k+ postcodes will be fantastic, please share the outcome of your test once you complete it.
Actually @John_Teichman wrote this post Qlik Sense Ideas, my personal experience is positive, about two years ago, QlikSense was randomly breaking when loading XML files, large file, some members of this site were really helpful escalating that issue, and the November-2018 version deployed the fix. Also, my experience with Qlik Vendors is really great.
Regards,
Arnaldo Sandoval
Hey - had some time to test this implementation with the larger set. ON the number transformation, did you do that within excel? Is there anyway to do it within Qlik (since excel is taking quite a while to process)