Extra REGEX for TV Show Episode matching - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33) +--- Forum: Tips, tricks, and step by step guides (https://forum.kodi.tv/forumdisplay.php?fid=110) +--- Thread: Extra REGEX for TV Show Episode matching (/showthread.php?tid=51614) |
Extra REGEX for TV Show Episode matching - xexe - 2009-05-24 In simple terms adding this to your Kodi configuration will match more TV shows than Kodi will by default.
After several months assisting users via IRC I decided to create a generic set of additional REGEX expressions to catch TV episodes Kodi does not and will not by default (e.g. Topaz etc) for fun. Quite a bit of offline testing has been completed, and whilst I am confident I cannot guarantee this REGEX will not produce some false positives. The method used in this REGEX differs from Kodi default methodology in SOME places by extracting the season number from folder names rather than the file name. By doing this we can match stuff that otherwise could never be matched. Examples: 13.show.dvd.avi only has the ep number but "MyShow/season 2/13.show.dvd.avi" has both. "MyShow/season 5/4400513.show.dvd.avi has a show name that includes numbers making matching very difficult. Installation Adding additional TV episode matching is simply a matter of inserting the code listed later to the a file called advancedsettings.xml. To locate and understand this XML file read the first part of this link Advancedsettings (wiki) Remember by default advancedsettings.xml will NOT exist. Also note the name of this file IS CASE SENSITIVE and will require a Kodi restart to be applied. End to end installation should take no more than 2 minutes. Required Folder Structure Approximately 50% of this REGEX requires you to have a sensible folder structure for your TV shows as follows: /showname/season x/episodes e.g. The Unit/season 2/the.unit.203.avi Note: Case is irrelevant Note: This is "Season 1" and NOT "Season 01" If you do not have this structure 50% of these REGEX's will NOT work for you. I have had some requests to support different structures. Whilst I am happy to accommodate some slight differences I cannot support multiple languages or weird ass structures. In the end trying to support this would make the REGEX ridiculously complicated for the majority of users whilst only helping a minority. The chosen structure was decided on after months of seeing what users had developed on their own. Most came to this structure independently and I am happy with it. Feedback You are welcome to experiment with this REGEX and report back if you need help or have some suggestions. I will maintain these first few post. I will happily add new REGEX under a few small conditions: 1. The format you are trying to match allows for a good chance of no false positives i.e. its not my intention to try and deal with absolute rubbish naming dredged from the bowels of the internet. 2. The format you are trying to match will be useful to other people. If the matches will only ever be of use to you alone then this set is not the place for it. 3. If you are suggesting REGEX please supply a couple of examples of the full path you are matching against so I/we can test it. 4. I wont be adding REGEX or updating the existing ones with über complex/1337 REGEX just because it can be done more cleverly. Or put another way we need to keep these simple so normal users can get to grips with them. Support If you wish support please do the following (NONE OF WHICH ARE OPTIONAL): #Update to the very latest stable release of Kodi. #Post a COMPLETE debug log to pastebin.com (no where else) and link it here (make sure this log catches the update library procedure. #The DEBUG log should contain lines with "DEBUG: could not enumerate file" or entry's which are matched incorrectly. If it does not contain these elements then you do not need help. #The DEBUG log should show that you are using this complete REGEX set and not small parts of it. The reason for this is that I cannot easily identify which set you are running but mostly the order in which these REGEX run is very important. If you don't do these 4 simple things I cannot help and wont even bother. Sorry but time is to short to help those that wont help themselves An again... DO NOT post hand written examples of problem file names or things you wish to match. I NEED the failed to enumerate lines to see what Kodi is seeing not what you think it is seeing. [/b] Please, this thread is for discussions on THIS REGEX compilation only. It is not for random REGEX support, how do i setup advancedsettings, why doesn't my library work or anything else. For all other topics create a new thread. Happy hunting - xexe - 2009-06-04 Current stable: V2.3 V2.4 - 05/09/2011 http://pastebin.com/UPPrk7VU Added more movie stacking REGEX. Read the WARNING. Dont run if you have a one pile movie pile. Note there are some weird movie stacking fails even though the REGEX is correct. Will debug with Eden later. Updated DIRFIX handling based on a bug posted by SoWErA. Cheers Lastly this is the last release that I will be testing with Dharma since I am moving to Eden. V2.3 - 18/01/2011 http://pastebin.com/N5mjtBxk WARNING: Big changes with little to no testing. Released due to demand. Use at your own risk. Non critical typos, formatting and spelling. Anime matches now happen before XBMC. By request. Tighter CRC test in anime Added movie stacking. Dont run these if you have a one pile file movie folder. (This is the reason XBMC doesnt do it natively) Added OpenELEC default CPU and GPU temp settings Added Samba timeout value Explicit abc123 was catching pipes as well Can move to \D instead of [^\d] etc. We have had proper PCRE for a long while now. Stick on \d{1,2} rather than a mix with \d\d? Now handles S1, S01, S 1, S 01 as well as Season1, Season01, Season 1, Season 01 directorys V2.2 - 02/11/2010 http://pastebin.com/ehjt8aVh Minimum required version Dharma Beta 4 Anime REGEX not capturing full ep number causing weird duplicates Anime REGEX now handles {[( for CRC encapsulation Anime REGEX - more tweaks Added dimonscreensave Removed all URL encoded REGEX as nothing to match against is URL ecoded anymore Trivial comment typos Removed a dupe REGEX that creapt in somehow Removed a bunch of useless trailing spaces Reordered. Should prove slighly faster now. V2.1 - 29/09/2010 http://pastebin.com/XDDx3Thy Yet more silly typos. Apologies to all. Non critical typos, formatting and spelling. Added first Anime match attempt. Tread carefully, anime naming is as oddball as anime itself. Shortened comment separators. Was starting to take up too many lines. Stripped out intro words and added them to change log. Increased recently added from 250 to 300. V2.0 - 20/09/2010 http://pastebin.com/EvL65F34 Added <backgroundupdate> update set to false for video library Added music library settings placeholder Added <backgroundupdate> set to false for music library Added <flattentvshows> set to never for video library Added first attempt at handling single episode DIRFIX re-releases Fixed lame copy and paste [ error Preparing for deprecated URL encoding requirement for RAR containers Added first attempt at handling multi episode DIRFIX re-releases Enabled GPU accelerated dds fanart <useddsfanart>. I debated setting this one but I suspect more users will want it than not. Please report back. Exclude REGEX is now VERY greedy. Anything with "extras" in it anywhere is excluded. In almost all instances this will be fine but YMMV. V1.9 - 08/09/2010 http://pastebin.com/jCqDF7hk Changed order. As feared the change to prepend caused false positives. Fixed bug with exclude REGEX and double // Exclude REGEX is now VERY greedy. Anything with "sample" in it anywhere is excluded. In almost all instances this will be fine but YMMV. Ignore Torrent client part files Tested against Dharma. Consider BETA quality. V1.8 - 18/01/2010 http://pastebin.com/f4a5aa918 Formatting Ignore case set. Ignore files named sample.* Minimal required SVN to operate r26522. V1.7 - 21/12/2009 http://pastebin.com/f656dd4f2 Changed to mostly inline comments for REGEX. Added a custom sort token to Ignore " when sorting (Thanks cptspiff for the fix). Minimal required SVN to operate r25845. V1.6 - 16/12/2009 http://pastebin.com/f195c0368 This is a significant update. Consider it ALPHA. In order to fix the broken REGEX I had to change almost all the custom REGEX to prepend. This increases the chances of false positives significantly although in testing they still perform well. This version requires SVN 25638+ to be fully compatible. Most will work with older versions. If you have a TV show called Extras rename it to Extras (2005) for the EXCLUDE REGEX to be compatible. Also a big thanks to Grum in IRC for the RAR REGEX. This will natively handle most SCENE RAR packs very accurately. Use at you own risk. V1.5 - 04/12/2009 Seems like EXCLUDE matching is case sensitive. Quick fix for testing. V1.1 - 11/11/2009 General cleanup in preparation for pastebin. V1.4 - 30/11/2009 http://pastebin.com/f74eb50d9 Tweaked Episode match to be a little less strict. Should catch /Shows/Mad Men/Season 1/Episode 12.avi etc V1.3 - 20/11/2009 Added setting to turn off auto thumbs V1.2 - 16/11/2009 http://pastebin.com/f6edebc7a Split folder exclusions "extras" into two REGEX one for movies and one for TV. As of r24405 video stacking regular expressions must contain exactly four (4) capture expressions. Removed old stacking REGEX will add back in as required. V1.1 - 11/11/2009 http://pastebin.com/f62eee83b General cleanup in preparation for pastebin. V1.0 - 30/10/2009 Replaced some of the stacking REGEX removed in commit 24060. WARNING this may break serials support. In general I am not happy with this new REGEX and it needs more work. This file also includes some general XBMC settings I use. It would be better if I didn't include these settings but doing so makes it easier for me. Delete them if they are not to your taste. V0.9 - 28/06/2009 http://pastebin.com/f544c8deb Default XBMC REGEX producing false positives with TPZ. To deal with this we now have both prepend and append REGEX. V0.8 - 10/06/2009 http://pastebin.com/f5f4fae52 After a IRC discussion with cptspiff and mgc I release this version to cater for TOPAZ releases but with NO REQUIRED FOLDER STRUCTURE. This should also handle Topaz which are still in RAR format. Please report back on success as I am working only from data scraped from Google. V0.7 - 08/06/2009 Added excludefromscan section. Do not catalogue anything in a folder called extras. Using the expected TV folder naming structure still allows the TV show "Extras". Note: This does not work for me but does for other users. Please report back your experiences. V0.6 - 06/06/2009 http://pastebin.com/f48cec53d New component. Commonly missed movie stacking REGEX. Big caveat, will NOT fix movies already in the library. To fix completely remove the multiple movie entries and rescan. V0.5 - 03/06/2009 Added REGEX to match some awful TV naming that has no season. This release marks 99% completion rate of Google scraped XBMC missed episodes (10,000+ ). The last REGEX in the list and may product false positives. Use with caution. V0.4 - 28/05/2009 Cater for cross platform difference in paths i.e. \/ V0.3 - 16/05/2009 Support for /season 5/Lost - 5 x 05.mkv V0.2 - 08/05/2009 TPZ matches now require season folder. Fixes some false positives. V0.1 - 05/05/2009 Initial Upload ####################################################################################### This REGEX is UNOFFICIAL/EXPERIMENTAL and may require a strict folder structure. *Use at your own risk* We use multiple REGEX rather than try to build one REGEX to rule them all. This wastes CPU cycles but allows easier bug finding, refining and end user understanding. The order they run is important. It will never catch all episodes. Since were trying to deal with bad naming it could result in false positives. Comments and submissions welcomed but try to keep it simple. If in doubt use two simple REGEX rather than one complex one. To install see: http://www.xbmc.org/wiki/?title=AdvancedSettings.xml Tested against Dharma onwards only but MAY be backwards compatible. ######################################################################################## - xexe - 2009-06-06 Reserved - xexe - 2009-06-10 Reserved - xexe - 2009-06-28 Reserved - havix - 2009-08-10 Are these regex's built into XBMC now? And if they aren't why not? - jmarshall - 2009-08-11 Some of them (as clearly commented in the file) would cause too many false positives, others are specific to scene groups such as tpz which we clearly don't want by default. - xexe - 2009-08-16 jmarshall is obviously is spot on. Its fine for users with the skill and motivation to add these but if they were default the false positives would eat up valuable dev support time epecially since users wouldnt have a clue why it was happening. I only did this for fun and to save time answering the same question over and over via the support IRC. Now dont get me wrong it IS surprisingly accurate and in my simulations we are talking only 1 ep in a 1000 errors BUT a user with a naming scheme that triggers one of these errors will likely trip hundreds of them. Lastly the TPZ naming scheme matching will NEVER be 100% accurate as their naming is quite simply completely useless. - j3ff - 2009-09-10 Thanks for this - before I found this thread I was beating my head against the wall wondering why XBMC was not finding my stuff. This made it all 100% perfect in 2 minutes. - dc_williamson - 2009-09-10 Most of my TV shows are stored in the folder structure: show name\season n\show name Snn Enn.extn (where nn=two digits 0-9). None of the above regex work for me. However I use this in my advancedsettings.xml file: <tvshowmatching action="prepend"> <regexp>[\\/]*S([0-9]+) E([0-9]+)[^\\/]*</regexp> </tvshowmatching> Which works fine Only issue comes when I have shows with no season number (like they were one season one) in the format show name\show name Enn.extn I'm still trying to work out a regex that'll work for this - for now I'm fudging it by renaming all the files in a Snn Enn format Anyway, feel free to use or add this to the above it it's useful. - locust - 2009-09-14 Boom! thanks so much for your hard work. I have lots of -TOPAZ releases and this was pissing me off to no end how it wouldn't scan in their episodes Nicely done! - locust - 2009-09-22 Actually I am not having luck with -MEDiEVAL rips though, for example Arrested.Development.S01E05.WS.DVDRip.XviD-MEDiEVAL has an archive name of med-ad105.rar and it's not picking it up (and i cant scan it into my library) any chance of getting a solution for that? thanks! - EvilMatt666 - 2009-11-07 I'm kinda new here but I've been messing about with XBMC for a while now. I have just installed the newest version and thought maybe it might sort my library problem for the tv show rips on my system but it hasn't and looking at the REGEX code I would doubt it would sort my naming system properly. Basically the way I rename all my files (because I'm anal) is as follows:- Tv Hard drive/tv show title/season 1/tv.show.-.101.-.episode.title.(DVD).avi/mkv/etc "101" would mean season 1 episode 01 and then "2024" would be season 20 episode 24. It just doesn't seem to pick up the episode details at all. In fact this last pass with the REGEX file utilised has just given me files numbered 1-30 and nothing inside in Library view. I can and have been using the file view to go through my TV shows but it's not perfect. Anyone got any ideas? Thanks in advance. - jmarshall - 2009-11-07 @EvilMatt666. Do a debug log while refreshing a show. It'll tell you right off whether it's detecting them properly or not. Don't just guess that things aren't working due to your file naming! - xexe - 2009-11-10 Can users confirm the the "Extras" removal regex is working for them. I have had reports that it works but i cannot get it working for myself which is curious. If it does work can you confirm if you use the tvshow.nfo and movie.nfo URL tag method. |