Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a dataset as shown:
The crew column has a json array
My code below allows me to parse only the first element of the json array, but not the rest. Any adjustments I could make to get all of the json array into a table
MainTable: // Give a table label
LOAD
movie_id,
title,
crew
FROM [lib://DataFiles/tmdb_5000_credits_smol.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
Attributes:
Generic
LOAD
RecId,
TextBetween(Attribute, '"', '"'),
TextBetween(Value, '"', '"')
;
LOAD
RecId,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
RecId,
SubField(raw, ',') as pair
;
LOAD
RecNo() as RecId, // Id used to stitch rows back together
TextBetween(crew, '{', '}') as raw
Resident MainTable;
Movie_id | Title | Crew |
19995 | Avatar | [{""credit_id"": ""52fe48009251416c750aca23"", ""department"": ""Editing"", ""gender"": 0, ""id"": 1721, ""job"": ""Editor"", ""name"": ""Stephen E. Rivkin""}, {""credit_id"": ""539c47ecc3a36810e3001f87"", ""department"": ""Art"", ""gender"": 2, ""id"": 496, ""job"": ""Production Design"", ""name"": ""Rick Carter""}, {""credit_id"": ""54491c89c3a3680fb4001cf7"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Sound Designer"", ""name"": ""Christopher Boyes""}, {""credit_id"": ""54491cb70e0a267480001bd0"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Supervising Sound Editor"", ""name"": ""Christopher Boyes""}] |
This is my version:
[RawTable]:
Load Movie_id,
Title,
SubField(Crew, ', {') as Crew;
LOAD * INLINE
[
Movie_id;Title;Crew
19995;Avatar;[{""credit_id"": ""52fe48009251416c750aca23"", ""department"": ""Editing"", ""gender"": 0, ""id"": 1721, ""job"": ""Editor"", ""name"": ""Stephen E. Rivkin""}, {""credit_id"": ""539c47ecc3a36810e3001f87"", ""department"": ""Art"", ""gender"": 2, ""id"": 496, ""job"": ""Production Design"", ""name"": ""Rick Carter""}, {""credit_id"": ""54491c89c3a3680fb4001cf7"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Sound Designer"", ""name"": ""Christopher Boyes""}, {""credit_id"": ""54491cb70e0a267480001bd0"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Supervising Sound Editor"", ""name"": ""Christopher Boyes""}]]
](delimiter is ';');
Attributes:
Generic
LOAD
Movie_id,
//RecId,
TextBetween(Attribute, '""', '""'),
If(IsNum(Value), Value, TextBetween(Value, '""', '""')) as Value
;
LOAD
//RecId,
Movie_id,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
// RecId,
Movie_id,
SubField(Crew, ',') as pair
Resident RawTable;
Thanks Lisa,
This works when I load the data inline as you have done, but when I use it on my dataset it doesn't work. I think this is because it only reads the first json array of data, but not the rest. My dataset has 10 different rows as shown above. My code is as shown:
RawTable:
LOAD
Movie_id,
Title,
SubField(Crew, ', {') as Crew
FROM [lib://DataFiles/tmdb_5000_credits_smol.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
Attributes:
Generic
LOAD
Movie_id,
//RecId,
TextBetween(Attribute, '""', '""'),
If(IsNum(Value), Value, TextBetween(Value, '""', '""')) as Value
;
LOAD
//RecId,
Movie_id,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
// RecId,
Movie_id,
SubField(crew, ',') as pair
Resident RawTable;