Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Address Standardization, possibly using tExtractRegexFields

Hi, I have an enterprise version of Talend for Data Services.
I am trying to standardize address data for a large data set, but using the Google components doesn't work as 1. it is extremely slow and 2. I run out of available queries as the data set is over 40,000 records
The addresses can come across in 2 ways.
Way 1, with 5 separate columns:
Address Line, City, State, Zip, Country
Example: 100 Main St | New York | New York | 90909 | US
Way 2, 1 column:
Address
Example: 100 Main St, New York, New York, 90909, US
I need to have the data separated like this:
Address Number, Street Name, City, State, Zip Code, Country

I am having trouble getting the Regex correct as I am new to Java and the Talend process of things. Is there a better way to do this? Or can anyone offer input as to how to set up the Regex.
The job process is currently:
FTPget----tFileInputDelimited----tMap(modifying columns)---tSplitRow(being used to pivot certain items)----tHashOutput
Somewhere within there I need to separate the address fields. Any help with this would be great! Thank you.
Labels (3)
1 Reply
Anonymous
Not applicable
Author

Hi mw629,
Have you tried to use TalendHelpCenter:tExtractDelimitedFields which can generate multiple columns from a given column in a delimited file.

Best regards
Sabrina