I was working with a new data set recently, among other fields the data set contains addresses, something like this:
Abbey Road, London
Since the data set doesn’t have a dedicated city field I decided to apply Subfield() to extract the City as a separate field. Easy task, right? I use the comma as a separator to split the string.
Subfield (address, ',', 2) as City,
Shortly after we decided to expand the data universe of the app to include other countries in our analysis, we ended up having multiple sources with slightly different address formats:
Abbey Road, London
Paseo del Prado, s/n , Madrid
91, Rue de Rivoli, 75001, Paris
Not great news, right? Due to the variable number of commas in the address field I can no longer use subfield to manipulate the string, or maybe I still can use it.
Searching from the end of the string
Subfield function not only takes positive values for the third parameter, it also accepts negatives values.A negative value searches from the end of the string. This will help us to extract the City while keeping my script highly readable. The new function will be something like:
Subfield (address, ',', -1)
The function above will return the first portion of the string separated by the delimiter comma (',') searching from the end of the string, right to left. The newly created "City" field contains:
Right to left search in String functions
The use of negative values, or right to left search (remember the read will still be performed from the start of the string) also apply to other string functions:
Subfield('Abbey Road, London', 'o',-2)
Returns 'nd' (the second occurrence of 'o' starting from the end)
Index('Abbey Road, London', 'o',-2)
Returns 14 (the second occurrence of 'o' starting from the end)
FindOneOf('Abbey Road, London', 'oes',-3)
Returns ‘8’ because the search is for any of the characters: o, e or s, and "o" is the third occurrence starting from the right, and is in position 8 starting from the left.