Mapping Springfields or how to improve the lookup accuracy
The on-the-fly location lookup is one of my favorite features of Qlik GeoAnalytics and the native map in Qlik Sense. Just add a field and the map automagically places the assets using the field content. This article explains how on-the fly lookup works and how you as user can improve the hit rate.
The on-the-fly lookup in Qlik is powered by a vast database populated with location entries. It holds currently approximately 7 million features of different types: countries, regions, municipalities, populated places, airport codes, zip codes etc.
Point density of the Qlik Location Database
In most cases each place has several aliases for the same location in local language or just a different spelling. When the database is queried it responds back with a geometry. Most of entries are points but the database also contains many area geometries for well-known regions. The database also keeps a hierarchy of the entries, a place belongs to country and a region, in order to distinguish places from each other.
How to improve
I will use 'Springfield' as an example to show how to get better matches. Springfield is one of the most common place name in the US. I pulled a list of “Springfields” from USGS, an organization that keeps records of all populated places in the US. The list included city name, state and county information. I loaded the list in to Qlik Sense as an inline table, at the bottom of the page you can find a link to my test app.
Just Springfield = 1 hit
My first attempt is just to make a point layer and add the city field. With that approach I will get one hit for a Springfield in Illinois, US.
That's not so surprising, the lookup service made a best effort given information and picked the largest city with that name. Also, the city field only have one distinct value as the dimension controls the number of objects on the map.
Add id dimension, country and location type
As a first improvement I switch to an id dimension that holds a unique id for each city to get a fair chance to placing more cities. The dimension is also used for selections, so I prefer an id that is easy to read. For this app id is the string with the city, county, state, country.
Looking location tab for the point layer I can see that can fill in more context for location. I start by adding my field ‘country’ and set the location type to ‘City, place’.
That helped, now I get more hits, one in US and one in Virgin Islands, because Virgin Islands has it’s own country code.
Adding state information
City and country are not enough in this case to pin point the locations, so I continue by providing the field ‘state’ to the location tab. Apparently it is common to have several cities named Springfield in the same state.
The hit rate increases, with city and state I get 30 hits, there are 29 states in the US with one or more cities called Springfield.
Adding county information
To get even more hits I need to provide more context, luckily for me the list of cities also contained information about county, the 2nd level of administrative boundaries in the US. I add that field also in the location tab.After that more places were found, now 66 cities of the 67 were properly located.
The edit mode of Sense lets the map provide error messages for the location service. In this case only one city was not located, apparently Clayton county in Georgia has two places called Springfield.
Advanced usage, using a location string
As an alternative, the user can put everything (location name and type information) in the location string instead of using the drop-down menus:
This produces the same result but is gives more control and might be more convenient for the advanced user. The ':P*' is the short hand code for a 'City, Place'. Read more about location types in the "Location Service Description" document in QGA documentation.
Testing city names at load time
Qlik GeoAnalytics provides an operation "NamedPointLookup" that can check how good the matching will be. This is convenient especially for larger address databases. In this example city names were processed. We got 1 hit on city level, 2 hits on country level, 30 hits on state level and 66 on county level. Check the load script for details on how to perform the NamedPointLookup.
Knowing which name to use
Most entries in the Qlik location database has several aliases, same place different spelling. The best way to find out the main name is use the Qlik GeoAnalytics connector and the "Load" operation. Here's an example to find out all the correct county names in the US:
Basically, the lookup becomes better with more context such location type, location spelling, location hierarchy. City names are often ambiguous, to resolve a common city name like "Springfield" info about the type, country, state and county are needed.