Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Associating tables on variable length field WITH wildcards

Hooboy.. when you import stuff into Canada or the US, you use a "Harmonised Service Code" (HSCODE). A full HS Code is 8 digits, but they are often subgrouped by chapter and heading into a format such as "16.02.3456", where "16" is the chapter and "02" is the heading.

I have an 11+ million record file (CBR) where the HSCODE is ALWAYS 8 full digits. Then I have a spreadsheet where the HSCode could be  in forms such as "1.02" or "16.94" or "220470" or even "89076504" (call these "SSCodes" for spreadsheet codes). I need to read in the spreadsheet, and then match the SSCodes with the CBR.HSCODE where, e.g. SSCode "1.02" would match any CBR.HSCode "0102????".

I dealt with the periods in the original format in my LOAD statement:

Replace([HS Match],'.','') as HSM,
Len(Replace([HS Match],'.','')) as HSL,

but this ends up giving me a mix of HSM's that are 4,6,8 digits long. I split them into three separate tables, using a WHERE clause based on HSM length. How can I create an association between the tables? The only thing I can think of is to create three tables where I take all the 4-digits HSM's and JOIN to CBR on HST4.HSM=LEFT(CBR.HSCODE,4),, another where I join on HST6.HSM=LEFT(CBR.HSCODE,6), etc. However, this seems clumsy.

The final wrinkle is I'm supposed to display all the HSM's, regardless of whether there's a match in the CBR file or not.

Is there another, smarter, more elegant way to associate the tables?

thanks,

Kevin

1 Solution

Accepted Solutions
Not applicable
Author

Well, the three table approach worked, because my user was happy to have the final tables segregated by length of code, so I created associative fields by taking the full length HS code from CBR ("Records" in the diagram), and taking LEFT 4, 6, 8 respectively to get:

SSCode.PNG

Then, I created three pivot tables. Since another requirement was to show all the values from the HSTx tables, regardless of whether there was a match in RECORDS, I used the fields "SSDesc" from the HSTx tables as my first dimension and selected "Show All Values" for it, then select the "HSTx.HSMatch" fields as the second dimension, and selected "Suppress If Null". for it.

View solution in original post

2 Replies
Clever_Anjos
Employee
Employee

One possible approach is load codes like 1.02 and generate FROM and TO fields 1020000 to 10299999 and use intervalmatch to match those records

Further reading : IntervalMatch

Not applicable
Author

Well, the three table approach worked, because my user was happy to have the final tables segregated by length of code, so I created associative fields by taking the full length HS code from CBR ("Records" in the diagram), and taking LEFT 4, 6, 8 respectively to get:

SSCode.PNG

Then, I created three pivot tables. Since another requirement was to show all the values from the HSTx tables, regardless of whether there was a match in RECORDS, I used the fields "SSDesc" from the HSTx tables as my first dimension and selected "Show All Values" for it, then select the "HSTx.HSMatch" fields as the second dimension, and selected "Suppress If Null". for it.