Solved: Mapping a Canadian Postal Code to a Postal Code In... - Qlik Community

InsightlyStudio · ‎2020-04-22

Hi there,

I am new to Qlik Sense and so just getting familiar with how to perform basic functions.

I am having trouble applying the IntervalMatch function to how I would have typically applied Xlookup functions in excel. I am trying to do lookups with Canadian Postal codes format (letternumberletter numberletternumber, eg A0A 1A0)

I currently have the two tables:

//this is the range table where there is an additional charge for extended region if the postal code falls within the Postal Code From and Postal Code To interval with Charge being the fee.

[Extended$]:
LOAD
[Postal Code From],
[Postal Code To],
[Charge]

// These are all the Canadian postal codes
[PostalCodeDatabase]:
LOAD
TRIM(REPLACE([PostalCode],' ','')) as [PostalCode],
[City],
[Province],
[AreaCode],
[Latitude],
[Longitude],
[CityMixedCase],
[RecordType],
GeoMakePoint([Latitude], [Longitude]) AS [Longitude_Latitude]

Note that not all PostalCodes in the PostalCodeDatabase will fall within a range on the Extended table. In that case the output would be $0 or "Not Found". Where a PostalCode fits within a range I'd like to map the Postal Code From (lower limit), Postal Code To (upper limit) and Charge to the PostalCodeDatabase or a bridge table. If there wasn't a match it can be "Not Found", "Not Found", 0.

I can do this pretty easily with XLookups and If then statements in excel, but can't quite figure out how to get the Xlookup function replicated.

Any help anyone can provide would be super appreciated!!

Thanks,
Julie

ArnadoSandoval · ‎2020-05-06

Hi @InsightlyStudio

You did a lot of typing while I was trying to resolve your issue integrating with R; I wrote the R function but I was having difficulties integrating the formula with the Qlik Script; I am glad you worked it out in the script; I changed the formula a little bit in two areas, first avoiding typing those long columns name, and second fine-tuning the formula.

I aliased three new columns within the script:

"Postal Code From" As PCf
"Postal Code To" As PCt
"PostCode" As PC

The original columns remained as part of the solution while PCf, PCt and PC were used in the formula, with a lot of less typing; I did preceding load to introduce these columns.

The script for the PC_RangeRate looks like this:

PC_RangeRate:
Load 
    "Postal Code From",
    "Postal Code To",
    Charge,
    Num(PC_From_Hex) As PC_From_Hex_Old,
    Num(PC_To_Hex)   As PC_To_Hex_Old,
    Num(Text( Ord(Mid(PCf,1,1))*1000000 + Ord(Mid(PCf,2,1))*1000 + Ord(Mid(PCf,3,1)) ) &'.'&
    Text(100000000 + Ord(Mid(PCf,4,1))*1000000 + Ord(Mid(PCf,5,1))*1000 + Ord(Mid(PCf,6,1)))) as PC_From_Hex,
    Num(Text( Ord(Mid(PCt,1,1))*1000000 + Ord(Mid(PCt,2,1))*1000 + Ord(Mid(PCt,3,1)) ) &'.'&
    Text(100000000 + Ord(Mid(PCt,4,1))*1000000 + Ord(Mid(PCt,5,1))*1000 + Ord(Mid(PCt,6,1)))) as PC_To_Hex
;

Load *,
     "Postal Code From" As PCf,
     "Postal Code To"   As PCt
Resident PC_RangeRate_Temp;

The data was loaded to a temporary table: PC_RangeRate_Temp, here the columns PCf and PCt where introduced, the formula looks a little bit different, but still using the Ord() function but the Right() function is not the correct one to use, I replaced it with the Mid() function because we get the ordinal from a particular position in the PostCode (actually, as I write my reply I realize it should work with the Right() function, but it is too late on my side, I already replaced it)

    Num(Text( Ord(Mid(PCf,1,1))*1000000 + Ord(Mid(PCf,2,1))*1000 + Ord(Mid(PCf,3,1)) ) &'.'&
    Text(100000000 + Ord(Mid(PCf,4,1))*1000000 + Ord(Mid(PCf,5,1))*1000 + Ord(Mid(PCf,6,1)))) as PC_From_Hex

You will notice few things here:

There are two Text() functions concatenated with a dot (.) between them, that is a decimal point.

The first 3 characters of the PostCode go in the first Text() functions, the remaining 3 on the second Text()

The Ord() number of the first character of the PostCode is multiplied by 1000000 (one million), so the letter 'A' become 65000000; the second character is multiplied by 1000 (one thousand) , so the letter '0' become 48000, when we add these results, we get 65048000, here the ordinal of the first character is not added to the ordinal of the second character, finally the last character is taken as it is, so for a letter 'A' is once again 65, adding these three ordinals: 65000000 + 48000 + 65 we get the number 65048065; the reason for using a thousand factor is because the ordinal for some characters are over one hundred causing overlapping additions. The sames logic was applied with the last 3 characters of the postcode, with the difference of adding an initial 100000000 (one hundred millions) the reason is once again preventing unexpected results if a post code contain lower case characters, as their ORD number is usually over 100.
With the PC_DataBase the column PC was introduced for the same reason, making easier to code the long function, the remaining part of the script is here:

NoConcatenate 

Temp_Table:
Load
    PostCode_Real,
    City,
    Province,
    "Area Code",
    PostCode_Old,
    Num(Text( Ord(Mid(PC,1,1))*1000000 + Ord(Mid(PC,2,1))*1000 + Ord(Mid(PC,3,1)) ) &'.'&
    Text(100000000 + Ord(Mid(PC,4,1))*1000000 + Ord(Mid(PC,5,1))*1000 + Ord(Mid(PC,6,1)))) as PostCode,    
;

LOAD
    "PostCode" as PostCode_Real,
    "PostCode" as PC,
    City,
    Province,
    "Area Code",
    Num(PC_Hex) As PostCode_Old
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join 
IntervalMatch (PostCode) Load "PC_From_Hex", "PC_To_Hex" Resident PC_RangeRate; 

PC_DataBase:
Load PostCode,
     City,
     Province,
     "Area Code",
     ApplyMap('Map_PC_RangeRate', "PC_From_Hex" & '|' & "PC_To_Hex" ) as RangeCharge,
     "PC_From_Hex" as PC_From,
     "PC_To_Hex"   as PC_To,
     ApplyMap('Map_PC_From', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_From_Real,
     ApplyMap('Map_PC_To', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_To_Real,
     PostCode_Real
Resident Temp_Table;

Drop Table Temp_Table;
Drop Table PC_RangeRate_Temp;

Your formula is lovely, I am glad you figured it out!

Best regards,

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

View solution in original post

ArnadoSandoval · ‎2020-04-23

Hi Julie,

This is a very interesting question, indeed we have to implement IntervalMatch as well as ApplyMap in our load script to answer it; We mocked up a dummy Extend$ and PostalCodeDatabase tables to workout a solution! below is my load script:

PC_RangeRate:
LOAD
    "Postal Code From",
    "Postal Code To",
    Charge
FROM [lib://PostalCodes/PC_RangeRate.txt]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
Load 
	Null() As "Postal Code From",
    Null() As "Postal Code To",
    0 As Charge
AutoGenerate(1);

Map_PC_RangeRate:
Mapping Load 
   "Postal Code From" & '|' & "Postal Code To",
   Charge 
Resident PC_RangeRate;

NoConcatenate 

Temp_Table:
LOAD
    "PostCode",
    City,
    Province,
    "Area Code"
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join 
IntervalMatch (PostCode) Load "Postal Code From", "Postal Code To" Resident PC_RangeRate; 

PC_DataBase:
Load PostCode,
     City,
     Province,
     "Area Code",
     ApplyMap('Map_PC_RangeRate', "Postal Code From" & '|' & "Postal Code To" ) as RangeCharge,
     "Postal Code From" as PC_From,
     "Postal Code To"   as PC_To
Resident Temp_Table;
Drop Table Temp_Table;

First we load the Extended$ table, I named PC_RangeRate in the script, concatenating an extra record with Null values for the "Postal Code From" and "Postal Code To" with a zero Charge. We need this record when handling records with a post code missing in the Extended$ table.

Next, we created a mapping table, Map_PC_RangeRate in the script, we will reference it later on the script.

Then, we load the PostalCodeDatabase as a Temp_Table, here We do a Left Join to the IntervalMatch statement on the PostCode column with the Extended$ (PC_RangeRate) resident table. The Temp_Table features the "Postal Code From" and "Postal Code To" columns, without the Charge column, that is the reason for the Map_RangeRate mapping table.

Next we create the PC_Database (your PostCodeDatabase) from the Resident Temp_Table, implementing the ApplyMap function as coded in the script, the resulting column as named RangeCharge to avoid the existing Charge column elsewhere. We also renamed "Postal Code From" and "Postal Code To" to PC_From and PC_To to avoid synthetic keys with the Extend$ table (the PC_RangeRate) in the script.

Note: You may very well exclude the "Postal Code From" and "Postal Code To" from the PostCodeDatabase (PC_Database) but I believe it could add value to your solution.

Finally the script drop the Temp_Table table; the screenshot below shows the results with my mocked data.

Regards,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

InsightlyStudio · ‎2020-04-23

Thanks for your detailed response and help!!

I'll test/review your solution and see if I can apply it - I'll circle back
if any questions/once I implement!

ArnadoSandoval · ‎2020-04-23

No worries @InsightlyStudio, please once you are done reviewing the solution, reply with some feedback.

Regards,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

InsightlyStudio · ‎2020-04-23

Hi @ArnadoSandoval

I tried to implement your solution pretty closely since I am not quite there on understanding the Mapping tables and the reason for the left join 🙂 [Hopefully I'll get there]

My results didn't work out as it doesn't seem to be able to find a match and all Postal Codes came back with a "Charge" of 0. Is this perhaps since my Postal Codes are alphanumeric rather than integers or numeric?

Or perhaps I missed something in my load coding (I've copied it below)

PC_RangeRate:
LOAD
	"Postal Code From",
	"Postal Code To",
	 Charge
 FROM [lib://DataFiles/UPS extended zone exercise .xls]
(biff, embedded labels, table is Extended$);

Load	
	Null() as "Postal Code From",
    Null() as "Postal Code To",
    0 As Charge
AutoGenerate(1);

Map_PC_RangeRate:
Mapping Load
   "Postal Code From" & '|' & "Postal Code To",
   Charge
Resident PC_RangeRate;
    
NoConcatenate

Temp_Table:
Load
	TRIM(REPLACE("PostalCode",' ','')) as "PostalCode",
	"City",
	"Province",
	"AreaCode",
	"Latitude",
	"Longitude",
	"CityMixedCase",
	"RecordType",
	GeoMakePoint("Latitude", "Longitude") AS "Longitude_Latitude"
FROM [lib://DataFiles/PostalCodeDatabase.xlsx]
(ooxml, embedded labels, table is PostalCodeDatabase);

Left Join 
IntervalMatch (PostalCode)
Load "Postal Code From","Postal Code To"
Resident PC_RangeRate;

PostalCodeDatabase:
Load 
    PostalCode,
	City,
	Province,
	AreaCode,
	Latitude,
	Longitude,
	CityMixedCase,
	RecordType,
	GeoMakePoint(Latitude, Longitude) AS Longitude_Latitude,
    ApplyMap('Map_PC_RangeRate',"Postal Code From" & '|' & "Postal Code To") as RangeCharge,
    "Postal Code From" as PC_From,
    "Postal Code To" as PC_To
    
 Resident Temp_Table;
 Drop Table Temp_Table;

ArnadoSandoval · ‎2020-04-24

Hi @InsightlyStudio

Actually I did not find anything wrong with the script, it works, the issue is with the MatchingInterval function/statement 😮; it is clearly written at its help page, shown below!

It works with "discrete numeric values" while Canadians Post Codes are alpha-numeric; I tried with some of your Canadian's post code replicating the same results you got 🙂; Nevertheless, I got it working with some mathematical trickery, which I will share with you on my next reply, while this reply focus on some actions you should take with your Qlik Vendor and perhaps somebody within the community could take over my advice to Qlik directly.

Actions:

Future:

You should request the MatchingIntervals function to be enhanced by Qlik, it does a great job working with numeric columns, but as you know it can handle alphanumeric intervals, like Canadians post codes.
Your Qlik Vendor should be the channel to take rising this request with Qlik, it will be good to find additional examples of string based intervals, I can't think of any at the moment.
Perhaps, somebody from the Qlik community could assist escalating this issue with the Qlik team.
The Qlik Community facility to suggest enhancements is not available at the moment, so nothing we could do.

Immediate:

I figured out a workaround that I will share with my next reply, it may help you to get by.
It could be possible to implement different workaround writing a function converting text string into numbers using languages like Python, C#, VB.Net; I am not in a position to do right now.
You could try finding such function at Qlik Branch, I tried earlier with no luck.
QlikView scripting language has features that could help to handle this challenge, but that is not the case for you.

I will add another reply with my workaround sometime soon, and explains the mathematical trickery and another Qlik scripting language limitation that make the trickery more trickier; It is like doing some witchcraft, you will see once I post my reply.

Best regards,

Hope this helps,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

ArnadoSandoval · ‎2020-04-24

Hi @InsightlyStudio

This is the workaround to implement the MatchingInterval for Canadians Post Codes.

The problem:

IntervalMatch works only with discrete numeric values by definition.
Canadian Post Codes are alphanumeric

The Challenge:

We need to convert Canadian Post Codes into numbers without losing its integrity, e.g. the new set of numbers representing the post codes should comply with Equivalence Relations; this is math, basically telling us that the new numerical representation of the Canadian Post Codes should preserve three mathematical logical relationships (it is at the top of the link in this paragraph). It sound intimidating, but it works, but still, it is another reason to escalate the IntervalMatch issue with Qlik.
Unfortunately, the match functions that could help us to transform Canadian Post Codes into number does not exist, we should transform them outside QlikSense, either using Excel or SQL (if the data is coming from a database); QlikView allows us to write the required function, but we are out of luck.

The Idea:

Each character has an equivalent decimal number, by example the decimal equivalent for the character 'A' is 65, 'B' is 66, '0' is 31, '1' is 32 and so on.
The Canadian post code 'A0A0A1' is made of 65, 30, 65, 30, 65, 31.
If the Canadian post codes only allow characters between A and Z, numbers between 0 and 9, we are dealing with two digits, but if you find it also allows lowercase characters between 'a' and 'z' we need three digits. I worked with 3 digits as I am not familiar with Canadian post codes.
Then the Canadian post code 'A0A0A1' is equivalent to 065 030 065 030 065 031, if we remove the space, it become this astronomical number: 065030065030065031
I transformed the Canadian post codes into these astronomical numbers with this function.

=TEXT(CODE(MID(A2,1,1)), "000") & TEXT(CODE(MID(A2,2,1)), "000") & TEXT(CODE(MID(A2,3,1)), "000") & TEXT(CODE(MID(A2,4,1)), "000") & TEXT(CODE(MID(A2,5,1)), "000") & TEXT(CODE(MID(A2,6,1)), "000")

I applied the transformation to the PC_RangeRate and PC_DataBase, and was ready to use these new columns in the Load Script.
Well, Qlik refused to load astronomical numbers, it converted all of them to null.
This is when the trickery become trickier, we learned from those countries with huge inflation rates, that moving the decimal separator to the left allow applications to work with astronomical number; this is a fact, it has been done in Mexico, Argentina and Venezuela, so I moved the decimal point 8 position to the left, manually (yes, the function above could do that, but I was in a hurry to see the results in Qlik.
I added 3 new columns to the PC_DataBase and PC_RangeRate tables, they are: PC_From_Hex, PC_To_Hex and PC_Hex; I had to do some column names aliasing to preserver the UI controls; two new mapping tables were introduced to retrieve the original post codes. Initially I was thinking to convert the post codes into hexadecimal, but the idea was discarded because hexadecimals are not 100% numeric. I did not renamed the column names.
The screenshots show the real post codes, plus the transformed one, you should never show the transformed ones on the UI, they are internal working data, I keep them to write this reply and debugging.

Now, this is the new script:

NoConcatenate 
PC_RangeRate:
LOAD
    "Postal Code From",
    "Postal Code To",
    Charge,
    Num(PC_From_Hex) As PC_From_Hex,
    Num(PC_To_Hex) As PC_To_Hex
FROM [lib://PostalCodes/PC_RangeRate.txt]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

Load 
	Null() As "Postal Code From",
    Null() As "Postal Code To",
    0 As Charge,
    Null() As PC_From_Hex,
    Null() As PC_To_Hex
AutoGenerate(1);

Map_PC_RangeRate:
Mapping Load 
   "PC_From_Hex" & '|' & "PC_To_Hex",
   Charge 
Resident PC_RangeRate;

Map_PC_From:
Mapping Load 
   "PC_From_Hex" & '|' & "PC_To_Hex",
   "Postal Code From" 
Resident PC_RangeRate;

Map_PC_To:
Mapping Load 
   "PC_From_Hex" & '|' & "PC_To_Hex",
   "Postal Code To" 
Resident PC_RangeRate;

NoConcatenate 

Temp_Table:
LOAD
    "PostCode" as PostCode_Real,
    City,
    Province,
    "Area Code",
    Num(PC_Hex) As PostCode
FROM [lib://PostalCodes/PC_DataBase.txt]
(txt, codepage is 28592, embedded labels, delimiter is ',', msq);
Left Join 
IntervalMatch (PostCode) Load "PC_From_Hex", "PC_To_Hex" Resident PC_RangeRate; 

PC_DataBase:
Load PostCode,
     City,
     Province,
     "Area Code",
     ApplyMap('Map_PC_RangeRate', "PC_From_Hex" & '|' & "PC_To_Hex" ) as RangeCharge,
     "PC_From_Hex" as PC_From,
     "PC_To_Hex"   as PC_To,
     ApplyMap('Map_PC_From', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_From_Real,
     ApplyMap('Map_PC_To', "PC_From_Hex" & '|' & "PC_To_Hex" ) as PC_To_Real,
     PostCode_Real
Resident Temp_Table;
Drop Table Temp_Table;

The new version of the script features the 3 new columns, and some aliasing to accommodate them with the existing logic and the UI.
I have to use Num() when loading the new transformed post codes, as Qlik was loading them as text !!!

Now these are the new screenshots:

The new Extended$ table with the new transformed PCs; you should not show them to the end-users, they are here to debug and ilustrate this solution.

This is the new Post Codes database, also showing the transformed post codes.

Suggestions:
This time my mock data was very poor, it proved the logic works and that the issue is with the Canadian post codes.
I hope you can extend the testing with a larger Extended$ and Post Code Database.
You should verify that the Postcodes from the database are properly allocated to the corresponding Extended$ ranges.
I suggest to check with somebody at your side with math knowledge to review the transformation formula.
If you have access to programmers, you may try asking them to write a function with JavaScript or Python that could be use to transform the post code.
It will always be good to ask Qlik to enhance their IntervalMatch function.

Hope these crazy ideas helps,

Regards,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

InsightlyStudio · ‎2020-04-24

I am so appreciative of the time you took to investigate and come up with a potential solution @ArnadoSandoval !

I'll give your suggestion a test with the larger/full data set of 800k+ postal codes and see what it comes back with.

On the suggestion side - do you find Qlik responsive to Qlik Product Insight & Ideas to be a good forum to raise those issues or are the ideas not broadly accepted?

ArnadoSandoval · ‎2020-04-24

@InsightlyStudio

Testing my proposed idea a test with 800k+ postcodes will be fantastic, please share the outcome of your test once you complete it.

Actually @John_Teichman wrote this post Qlik Sense Ideas, my personal experience is positive, about two years ago, QlikSense was randomly breaking when loading XML files, large file, some members of this site were really helpful escalating that issue, and the November-2018 version deployed the fix. Also, my experience with Qlik Vendors is really great.

Regards,

Arnaldo Sandoval

Arnaldo Sandoval
A journey of a thousand miles begins with a single step.

InsightlyStudio · ‎2020-05-05

Hey - had some time to test this implementation with the larger set. ON the number transformation, did you do that within excel? Is there anyway to do it within Qlik (since excel is taking quite a while to process)

Mapping a Canadian Postal Code to a Postal Code Interval

Qlik Sense