Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
GeorgePhilips23
Partner - Contributor III
Partner - Contributor III

Breaking columns, error - field "cast" not found in Qlik Sense

I have fields Show name, Cast, Director, Country released, Genre.

Where  most of the rows are similar to.

Show cast director

country

Sankofa Haile Gerima Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra Duah, Nick Medley, Mutabaruka, Afemo Omilami, Reggie Carter, Mzuri United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia
The Great British Baking Show Andy Devonshire Mel Giedroyc, Sue Perkins, Mary Berry, Paul Hollywood United Kingdom

 When I use, [SubField(cast,',') as Actor, SubField(director,',') as Director, SubField(country,',') as Country]

Some memory error comes up,

Please someone help me out with the situation!

Labels (2)
2 Solutions

Accepted Solutions
rubenmarin

If it says "field cast not found" is because the resident/from is not loading to the correct data source.

OriginalData:
LOAD
show_id, "type", title, listed_in, date_added,release_year,rating,duration,description,
'Netflix' as Network
FROM [lib://DataFiles/netflix_titles.csv]
(txt, utf8, embedded labels, delimiter is ',', msq);

Concatenate(OriginalData)
... Other platforms

Shows:
NoConcatenate LOAD
show_id, "type", title, listed_in, date_added,release_year,rating,duration,description,Network
Resident OriginalData;

Cast:
NoConcatenate LOAD
show_id, Subfield(cast,',') as cast
Rrsident OriginalData;

Director:
NoConcatenate LOAD
show_id, Subfield(director,',') as director
Rrsident OriginalData;

Country:
NoConcatenate LOAD
show_id, Subfield(country,',') as country
Rrsident OriginalData;

DROP Table OriginalData;

 

View solution in original post

rubenmarin

Both seem problems caused by the source of the data, it would be better to correct them there so the file has a correct structure.

For point 1 you can check for any character that only has one of the fields, and use this character to reading, something like:

 

If(Index(duration,':')>0, duration, rating) as duration,
If(Index(duration,':')>0, rating, duration) as rating

 

I don't know about the 2nd.

View solution in original post

9 Replies
rubenmarin

Hi, doing subfield that way you are creating a cartesian table with every value of a row related to every other, and actors, directors and countries would make a lot of records.

Maybe you can try with a star schema:

- Central table loads just show, and the fields related 1-1 to the show, if there is any, if not you can omit this table.

- Another table for cast with "LOAD Show, SubField(cast,',') as Actor From/Resident..." 

- Another table for directors with "LOAD Show, SubField(director,',') as Director From/Resident..." 

- Another table for the country with "LOAD Show, SubField(country,',') as Country From/Resident..." 

So Show is the link between all tables and you're reducing a lot the number of final rows.

In a table with 10 actors, 2 directors, and 8 countries, a cartesian table will have 10*2*8=160 rows of data. Split in tables it would be 10+2+8= 20 rows.

GeorgePhilips23
Partner - Contributor III
Partner - Contributor III
Author

Thanks for your guidance! But its kinda not working, when I dont take cast in central table instead only in resident, it says field cast not found!

Actually I have 4 different platforms/tables with all similar field names and similar problems as I mentioned before.

Netflix:
LOAD
show_id, "type", title, cast, director, listed_in, country,date_added,release_year,rating,duration,description,
'Netflix' as Network
FROM [lib://DataFiles/netflix_titles.csv]
(txt, utf8, embedded labels, delimiter is ',', msq);

Similarly for Amazon,Disney and HULU.

I hope I will get a useful solution.

 

rubenmarin

If it says "field cast not found" is because the resident/from is not loading to the correct data source.

OriginalData:
LOAD
show_id, "type", title, listed_in, date_added,release_year,rating,duration,description,
'Netflix' as Network
FROM [lib://DataFiles/netflix_titles.csv]
(txt, utf8, embedded labels, delimiter is ',', msq);

Concatenate(OriginalData)
... Other platforms

Shows:
NoConcatenate LOAD
show_id, "type", title, listed_in, date_added,release_year,rating,duration,description,Network
Resident OriginalData;

Cast:
NoConcatenate LOAD
show_id, Subfield(cast,',') as cast
Rrsident OriginalData;

Director:
NoConcatenate LOAD
show_id, Subfield(director,',') as director
Rrsident OriginalData;

Country:
NoConcatenate LOAD
show_id, Subfield(country,',') as country
Rrsident OriginalData;

DROP Table OriginalData;

 

GeorgePhilips23
Partner - Contributor III
Partner - Contributor III
Author

Really great one Brother!

I also have 1. some rows of duration and rating interchanged

                       2.some rows of title contains title as well as rest of the field values separated by ','

, any good way to resolve that?

rubenmarin

Both seem problems caused by the source of the data, it would be better to correct them there so the file has a correct structure.

For point 1 you can check for any character that only has one of the fields, and use this character to reading, something like:

 

If(Index(duration,':')>0, duration, rating) as duration,
If(Index(duration,':')>0, rating, duration) as rating

 

I don't know about the 2nd.

GeorgePhilips23
Partner - Contributor III
Partner - Contributor III
Author

Okay with the 2nd one, I shall correct it in the source itself as I have only 2 fields with such a problem.

But with the 1st one, I got no specific character in those fields.

Duration consists of 'specific' min and 'specific' Season/s

whereas

Rating is filled with variety like R, TV-MA, TV-G, NR,13+,etc.

Thank you!

rubenmarin

Can you post some samples of the duration values?

Or if you have Ratings defined you can use an inline table and use applymap to check if it's any of the rating values:

 

mapRatings:
Mapping LOAD *, 1 as israting Inline[
R
TV-MA
TV-G
..
];

LOAD ...
  If(applymap('mapRatings',rating,0)=1, rating, duration) as rating,
  If(applymap('mapRatings',rating,0)=1, duration, rating) as duration

 

 

GeorgePhilips23
Partner - Contributor III
Partner - Contributor III
Author

Thank you for the help, everything worked well !

GeorgePhilips23
Partner - Contributor III
Partner - Contributor III
Author

How I solved Durations and Ratings column misplaced values issue;

Durations:
NoConcatenate Load show_id,
If(Index(duration,'min')>0, duration, WildMatch(rating,'*min*')) as RunTime,
If(Index(duration,'Seasons')>0, duration, '1 Season') as Seasons

Resident Data;

Ratings:
NoConcatenate Load show_id,
If(WildMatch(rating,'*TV*','*ALL*','R','N*','AGE*','13+','16','16+'
,'17','18+','7+','G'),

rating,


WildMatch(duration,'*TV*','*ALL*','R','N*','AGE','13+','16','16+'
,'17','18+','7+','G')) as rating
Resident Data;