Skip to main content
Announcements
Jan 15, Trends 2025! Get expert guidance to thrive post-AI with After AI: REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
MauriceBruns2019
Contributor II
Contributor II

Country Code search in text string

Hello everyone,

I am trying to figure out a way to identify a country code within a string. The string that is loaded is a variable length string and within that string it 'could' contain a country code.

The goal is to check if a specific country code exists in a text; then i would compare this with a specific country code field.

For example:

Project NameText CountryImpacted CountryOutcome
Project A: Simon says jump - NLNLNLTRUE
Project B: Simon says sing - UKUKNLFALSE


Some of the problems that i have:

  • The Project Name text is not consistent in terms of structure
  • The text might include a dash, then the code:  'Simon says jump - NL'; and sometimes not 'Simon says jump NL'
  • Often the text does not contain a country code, and
  • Perhaps the search might pick up the 'ro' from the text 'Project', and present RO [Romania] as my Text Country result.  

Any ideas anyone?

Maurice

Labels (1)
3 Replies
dplr-rn
Partner - Master III
Partner - Master III

does the country code always come at the end? also is the length always 2?

if so why not get the last 2 characters from your string and compare using that

MauriceBruns2019
Contributor II
Contributor II
Author

Unfortunately this would be an extra problem. The country code is not always at the end.
For example:
'Project C: Simon says dance in NL (if you want)'

cwolf
Creator III
Creator III

The easiest you can do is a JScript macro, where you can use a regular expression (VB doesn't known regular expressions).

In the macro editor it looks like this:

var regExp;

function InitRegExp(codes){
	regExp=new RegExp("(\\s|-)("+codes+")(\\s|-|\$)");
	return 0;
}

function GetCountryCode(text){
	var res=regExp.exec(text);
	return res ? res[2] : "";
}

And the script for example:

Countries:
LOAD * INLINE [
Code
NL
RO
US
EN
];

t:
LOAD
InitRegExp(Concat(DISTINCT Code,'|')) as InitRegExp
Resident Countries;

drop table t;

Projects:
LOAD *,
GetCountryCode(Project) as Country 
INLINE [
Project
Project A: Simon says jump - NL
Project A: Simon says jump -RO
Project A: Simon says jump -Ro NL
Project A: Simon says jump US
US Project A: Simon says jump -EN- 
Project A: Simon NL says jump
Project A: SimonNL says jump
Project A: Simon NLsays jump
Project A: Simon -US- says jump
Project A: Simon -EN says jump   
];

In the example the regular expression is build from "(\\s|-)(NL|US|EN|RO)(\\s|-|\$)".

This means it consists of three groups:
1. "(\\s|-)" the start of search string, can be a whitespace or "-" (it cannot be at the beginning of the text that is being searched for matches)
2. followed by "(NL|US|EN|RO)" one of the country code
3. followed by "(\\s|-|\$)" the end of searchs tring, can be a whitespace, a "-" or it can be at the end of the text.

Because of the three groups, the result of exec() is an array with 4 elements. The first is the whole match string, the others are the results for each group. So the function return only the result for the second group.

The regulare expression is case sensitive! It only searches for country codes in capital letters.