Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a data set called "dataset" like this:
{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}
{"sport":["Skiing"],"EmploymentStatus":"Student"}
How do I best make the dimensions sport_Fotball=True/False and the same for EmploymentStatus?
any sample data?
Those are two sample records:
{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}
{"sport":["Skiing"],"EmploymentStatus":"Student"}
Want to make dimensions for SportFootball, SportSkiing, SportRunning and EmploymentStatus.
You could use this expression, if the field name is sport:
SubStringCount(sport, 'Football')
It will return 0 which equates False if Football is not present. If it is present, it returns 1 (or possibly more) which is interpreted as True.
Is all this stored in one field, or in two fields when you load the data?
The format seems a bit awkward to parse into two fields: sport and EmploymentStatus
I think you need to share some more about your data set than this.
Sorry, been away on business.
Yes it's all in one field.
The data has been anonymized, but this is an example of the structure of the data.
4 records of the datafield:
{"sport":["Football","Skiing","Running"],"EmploymentStatus":"Self-employed"}
{"sport":["Skiing"],"EmploymentStatus":"Student"}
{"sport":["Skiing","Running"],"EmploymentStatus":"Worker"}
{"sport":["Football","Running"],"EmploymentStatus":"Self-employed"}
I would like to structure these data into:
Sport_Football= True/False
Sport_Skiing= True/False
Sport_Running= True/False
EmploymentStatus= “Self-employed" or “Student” or “Worker”
The 1st record would be:
Sport_Football= True
Sport_Skiing= True
Sport_Running= True
EmploymentStatus= “Self-employed"
2nd record:
Sport_Football= False
Sport_Skiing= True
Sport_Running= False
EmploymentStatus= “Student"
Provided you read this as a field called data, you can use these expressions:
Sport_Football = SubStringCount(data, 'Football')
Sport_Skiing = SubStringCount(data, 'Skiing')
Sport_Running = SubStringCount(data, 'Running')
EmploymentStatus = PurgeChar(SubField(data, ',"EmploymentStatus":', 2), '"}')