Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi folks, hope you can help me out with this. I have a dataframe with a movie title column. The values may or may not contain the 'year' of the movie. For the one that have the year, I would like to extract the year. The issue is, although most of the rows have the year at the end of the title, there are other rows with year the 'year' could be anywhere in the values. I understand simple regex, but this is above my head. What would be a proper way to extract the years and append them to dataframe as a new column named 'movie_year' (NaN if there is no year)? Thanks in advance.
Some sample rows (there are 65k rows in the dataframe)
Title
0 The Naked Truth (1957) (Your Past Is Showing)
1 Millions Game, The (Das Millionenspiel)
2 Body/Cialo
3 Death Note: Desu nôto (2006–2007)
4 My Own Man
5 Primal Fear ((1996))
6 Gladiator (2000)
7 Spy(ies) (Espion(s)) (2009)
8 The Naked Truth (1957) (Your Past Is Showing)
What's the year value you expect to extract using regex for this kind of rows?
3 Death Note: Desu nôto (2006–2007)