Cleandatetime - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33) +--- Forum: General Support (https://forum.kodi.tv/forumdisplay.php?fid=111) +---- Forum: OS independent / Other (https://forum.kodi.tv/forumdisplay.php?fid=228) +---- Thread: Cleandatetime (/showthread.php?tid=212383) |
Cleandatetime - Kinematics - 2014-12-26 The default regex for cleandatetime is really bad. To clarify some details for anyone searching for info: CUtils::CleanStrings first pulls the <cleandatetime> regex that you can specify in advancedsettings.xml. Only one regex string is allowed in that field. The first group matched is determined to be the title. The second group matched is determined to be the year (and is passed in to the scraper in buffer $$2). Any additional groups matched are discarded. If the regex isn't matched at all, nothing is inserted into the year group and the entire file name string is passed on to the <cleanstrings> portion of name handling. If a match is found, everything other than the year and first group found (generally everything before the start of the year info) is discarded. I'll list a number of possible year labels on films, and explain what happens with the default regex, and with mine (shown below). The films aren't generally real, I'm just listing different patterns. 'no match' means it will use the entirety of the provided file name, and not provide any year. Otherwise, I will show the captured title, then a slash, then the year that was determined. My Movie - default: no match - mine: no match My Movie 2004 - default: My Movie / 2004 - mine: My Movie / 2004 My Movie (2004) - default: no match - mine: My Movie / 2004 My_Movie_2004 - default: My_Movie / 2004 - mine: My_Movie / 2004 My Movie[2004] - default: no match - mine: My Movie / 2004 My TV Show (2004-2005) - default: no match - mine: My TV Show / 2004 My TV Show ( 2004 - 2005 ) - default: My TV Show ( 2004 / 2005 - mine: My TV Show / 2004 2001: A Space Odyssey - default: no match - mine: no match 2001: A Space Odyssey (1968) - default: no match - mine: 2001: A Space Odyssey / 1968 Knives: 2000 Ways to Kill Someone - default: Knives: / 2000 - mine: Knives: 2000 Ways to Kill Someone Knives: 2000 Ways to Kill Someone.2001 - default: Knives: 2000 Ways to Kill Someone / 2001 - mine: Knives: 2000 Ways to Kill Someone / 2001 Knives: 2000 Ways to Kill Someone-2001 - default: Knives: 2000 Ways to Kill Someone / 2001 - mine: Knives: 2000 Ways to Kill Someone / 2001 Knives: 2000 Ways to Kill Someone[2001] - default: Knives: / 2000 - mine: Knives: 2000 Ways to Kill Someone / 2001 1999.S00E01 - default: no match - mine: no match 1999.S00E01.1974 - default: 1999.S00E01 / 1974 - mine: 1999.S00E01 / 1974 1999 - S00E01 (1974) - default: no match - mine: 1999 - S00E01 / 1974 Umika - Sincerity [AKROSS_Con_2012] - default: no match - mine: Umika - Sincerity / 2012 Oasis - Falling Down (East of the Eden version)[2008][h264] - default: Oasis - Falling Down (East of the Eden version / 2008 - mine: Oasis - Falling Down (East of the Eden version) / 2008 The 1975 Show (1975) - default: The, 1975 - mine: The 1975 Show / 1975 The Tonight Show of 1995 (1995) - default: The Tonight Show of / 1995 - mine: The Tonight Show of / 1995 As you can see, there are quite a few patterns that are just broken using the default regex. The following is the regex that I've built up to handle as many different cases as feasible, from the various testing that I've been able to manage. It handles everything that I've been able to throw at it except for that last pattern, and I'm not sure there's any reasonable way to deal with that except completely disallowing dates that are only preceded by spaces (something I would not object to, but since the default allows simple spaces as delimiters, I'm allowing that in mine). Code: <cleandatetime>(.+?)(?:\s*(?:(?:[[({])(?:[^])}]*)(?:_|\b))|[ _.,-]\s*)((?:19|20)\d{2})(?:(?:_|\s)*-(?:_|\s)*(?:19|20)\d{2})?\b(?!(?:\s*\w)+)[^\\/]*?$</cleandatetime> A version that doesn't allow simple spaces to be a delimiter for a year: Code: <cleandatetime>(.+?)(?:\s*(?:(?:[[({])(?:[^])}]*)(?:_|\b))|[_.,-]\s*)((?:19|20)\d{2})(?:(?:_|\s)*-(?:_|\s)*(?:19|20)\d{2})?\b(?!(?:\s*\w)+)[^\\/]*?$</cleandatetime> And for reference, here's the default regex: Code: <cleandatetime>(.+[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-1][0-9])([ _\,\.\(\)\[\]\-][^0-9]|$)</cleandatetime> Edit: Fixed the regex slightly. RE: Cleandatetime - apo86 - 2021-03-16 THANK YOU, internet person from 2014! Wonder Womand 1984 sent me on a quest for "wtf is wrong with this scraper" and this post is where I found salvation. RE: Cleandatetime - Karellen - 2021-03-16 (2021-03-16, 00:44)apo86 Wrote: Wonder Womand 1984Could it be because you spelt it wrong? RE: Cleandatetime - apo86 - 2021-03-16 No, it was definitely cleandatetime stripping the year from the movie title. I'm more diligent with my file names than with my forum posts |