Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How do I use this text string to make a true/false dimension

I have a data set called "dataset" like this:

{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}

{"sport":["Skiing"],"EmploymentStatus":"Student"}

How do I best make the dimensions sport_Fotball=True/False and the same for EmploymentStatus?

6 Replies
Anonymous
Not applicable
Author

any sample data?

Not applicable
Author

Those are two sample records:

{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}

{"sport":["Skiing"],"EmploymentStatus":"Student"}

Want to make dimensions for SportFootball, SportSkiing, SportRunning and EmploymentStatus.

Ralf-Narfeldt
Employee
Employee

You could use this expression, if the field name is sport:

SubStringCount(sport, 'Football')

It will return 0 which equates False if Football is not present. If it is present, it returns 1 (or possibly more) which is interpreted as True.

Ralf-Narfeldt
Employee
Employee

Is all this stored in one field, or in two fields when you load the data?

The format seems a bit awkward to parse into two fields: sport and EmploymentStatus

I think you need to share some more about your data set than this.

Not applicable
Author

Sorry, been away on business.

Yes it's all in one field.

The data has been anonymized, but this is an example of the structure of the data.


4 records of the datafield:

{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}

{"sport":["Skiing"],"EmploymentStatus":"Student"}

{"sport":["Skiing","Running"],"EmploymentStatus":"Worker"}

{"sport":["Football","Running"],"EmploymentStatus":"Self-employed"}


I would like to structure these data into:

Sport_Football= True/False

Sport_Skiing= True/False

Sport_Running= True/False

EmploymentStatus= “Self-employed" or “Student” or “Worker”

The 1st record would be:

Sport_Football= True

Sport_Skiing= True

Sport_Running= True

EmploymentStatus= “Self-employed"

2nd record:

Sport_Football= False

Sport_Skiing= True

Sport_Running= False

EmploymentStatus= “Student"

Ralf-Narfeldt
Employee
Employee

Provided you read this as a field called data, you can use these expressions:


Sport_Football = SubStringCount(data, 'Football')

Sport_Skiing = SubStringCount(data, 'Skiing')

Sport_Running = SubStringCount(data, 'Running')

EmploymentStatus = PurgeChar(SubField(data, ',"EmploymentStatus":', 2), '"}')