Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
maaaaa
Contributor II
Contributor II

Parsing Json array from csv

I have a dataset as shown:

The crew column has a json array

My code below allows me to parse only the first element of the json array, but not the rest. Any adjustments I could make to get all of the json array into a table

MainTable: // Give a table label
LOAD
movie_id,
title,
crew
FROM [lib://DataFiles/tmdb_5000_credits_smol.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

Attributes:
Generic
LOAD
RecId,
TextBetween(Attribute, '"', '"'),
TextBetween(Value, '"', '"')
;
LOAD
RecId,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
RecId,
SubField(raw, ',') as pair
;
LOAD
RecNo() as RecId, // Id used to stitch rows back together
TextBetween(crew, '{', '}') as raw
Resident MainTable;

 

Movie_idTitleCrew
19995Avatar[{""credit_id"": ""52fe48009251416c750aca23"", ""department"": ""Editing"", ""gender"": 0, ""id"": 1721, ""job"": ""Editor"", ""name"": ""Stephen E. Rivkin""}, {""credit_id"": ""539c47ecc3a36810e3001f87"", ""department"": ""Art"", ""gender"": 2, ""id"": 496, ""job"": ""Production Design"", ""name"": ""Rick Carter""}, {""credit_id"": ""54491c89c3a3680fb4001cf7"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Sound Designer"", ""name"": ""Christopher Boyes""}, {""credit_id"": ""54491cb70e0a267480001bd0"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Supervising Sound Editor"", ""name"": ""Christopher Boyes""}]
Labels (1)
2 Replies
Lisa_P
Employee
Employee

This is my version:

[RawTable]:

Load Movie_id,
Title,
SubField(Crew, ', {') as Crew;
LOAD * INLINE
[
Movie_id;Title;Crew
19995;Avatar;[{""credit_id"": ""52fe48009251416c750aca23"", ""department"": ""Editing"", ""gender"": 0, ""id"": 1721, ""job"": ""Editor"", ""name"": ""Stephen E. Rivkin""}, {""credit_id"": ""539c47ecc3a36810e3001f87"", ""department"": ""Art"", ""gender"": 2, ""id"": 496, ""job"": ""Production Design"", ""name"": ""Rick Carter""}, {""credit_id"": ""54491c89c3a3680fb4001cf7"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Sound Designer"", ""name"": ""Christopher Boyes""}, {""credit_id"": ""54491cb70e0a267480001bd0"", ""department"": ""Sound"", ""gender"": 0, ""id"": 900, ""job"": ""Supervising Sound Editor"", ""name"": ""Christopher Boyes""}]]
](delimiter is ';');

Attributes:
Generic
LOAD
Movie_id,
//RecId,
TextBetween(Attribute, '""', '""'),
If(IsNum(Value), Value, TextBetween(Value, '""', '""')) as Value
;
LOAD
//RecId,
Movie_id,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
// RecId,
Movie_id,
SubField(Crew, ',') as pair
Resident RawTable;

maaaaa
Contributor II
Contributor II
Author

Thanks Lisa,

 

This works when I  load the data inline as you have done, but when I use it on my dataset it doesn't work. I think this is because it only reads the first json array of data, but not the rest. My dataset has 10 different rows as shown above. My code is as shown:

RawTable:
LOAD
Movie_id,
Title,
SubField(Crew, ', {') as Crew
FROM [lib://DataFiles/tmdb_5000_credits_smol.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

Attributes:
Generic
LOAD
Movie_id,
//RecId,
TextBetween(Attribute, '""', '""'),
If(IsNum(Value), Value, TextBetween(Value, '""', '""')) as Value
;
LOAD
//RecId,
Movie_id,
SubField(pair, ':', 1) as Attribute,
SubField(pair, ':', 2) as Value
;
LOAD
// RecId,
Movie_id,
SubField(crew, ',') as pair
Resident RawTable;