Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
myicc
Contributor
Contributor

Dataframe, Extract and Regex...

Hi folks, hope you can help me out with this. I have a dataframe with a movie title column. The values may or may not contain the 'year' of the movie. For the one that have the year, I would like to extract the year. The issue is, although most of the rows have the year at the end of the title, there are other rows with year the 'year' could be anywhere in the values. I understand simple regex, but this is above my head. What would be a proper way to extract the years and append them to dataframe as a new column named 'movie_year' (NaN if there is no year)? Thanks in advance.

Some sample rows (there are 65k rows in the dataframe)

Title
0 The Naked Truth (1957) (Your Past Is Showing)

1 Millions Game, The (Das Millionenspiel)

2 Body/Cialo

3 Death Note: Desu nôto (2006–2007)

4 My Own Man

5 Primal Fear ((1996))

6 Gladiator (2000)

7 Spy(ies) (Espion(s)) (2009)

8 The Naked Truth (1957) (Your Past Is Showing)

Labels (1)
1 Reply
Shicong_Hong
Employee
Employee

What's the year value you expect to extract using regex for this kind of rows?

3 Death Note: Desu nôto (2006–2007)