filmstarts.de scraper development - help needed - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32) +--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60) +--- Thread: filmstarts.de scraper development - help needed (/showthread.php?tid=33624) |
filmstarts.de scraper development - help needed - floohh - 2008-05-28 Hi guys, im currently developing around a http://filmstarts.de scraper. After hours I managed to get the XBMC to recognize the search results of filmstarts, but when i try to get the details page, XBMC requests a empty url. I think there is a failure in my RegExp Code And here we go: The Filmstarts-HTMl looks like: [HTML] <li><a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html"> <img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg"> <span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10"> </span> <span class="t">Der Fluch von Darkness Falls</span> <span class="g">Teenie-Horror</span> </a></li> <li><a href="/kritiken/36232-Fluch-der-Karibik.html"> <img alt="" src="http://thumbs.filmstarts.de/nano/fluchderkaribik-poster1.jpg"> <span class="r"> <img src="/designs/default/images/ratings/910er.gif" alt="Wertung: 9 / 10"> </span> <span class="t">Fluch der Karibik</span> <span class="g">Abenteuer</span> </a></li> <li><a href="/kritiken/37419-Blueberry-und-der-Fluch-der-D%E4monen.html"> <img alt="Blueberry und der Fluch der Dämonen" src="/designs/default//images/no_film_small.gif" height="44" width="30"> <span class="r"> <img src="/designs/default/images/ratings/610er.gif" alt="Wertung: 6 / 10"> </span> <span class="t">Blueberry und der Fluch der Dämonen</span> <span class="g">Fantasy-Action</span> </a></li> [/HTML] and my RegExp is: Code: <GetSearchResults dest="3"> (I know the entities aren't converted, but I decoded them for better understanding) The most important line is: Code: <expression repeat="yes"><a href="/kritiken/([-.%\w]+)">[^<]|[\n]<span class="t">([-%. \w]+)</span></expression> to recognize [HTML] <a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html"> <img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg"> <span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10"> </span> <span class="t">Der Fluch von Darkness Falls</span> [/HTML] The only thing XBMC does is to request "/" Can somebody may help me? - floohh - 2008-05-28 okay i altered the term for skipping the unneccesary text, but now i only catch the first match, any idea how to solve? Code: <li><a href="/kritiken/([-.a-z0-9A-Z]+)">.*<span class="t">([0-9a-zA-Z .]+).*</li> - floohh - 2008-05-29 After hours of hard work, finally it worked - spiff - 2008-05-29 i assume your issue was that you didn't realize you are writing xml. so you need to escape special chars such as ", i.e. do " sorry i didnt see you inquery earlier. feel free to ask again i will try to be of help when i see it - floohh - 2008-06-02 Find solution here: Link - spiff - 2008-06-02 great - the more the merrier. will add to svn cheers - tatoosh - 2009-06-29 Hey, i cant download your filmstarts.de scraper. can u give me a link? it would be nice to use this great website. - w00dst0ck - 2009-06-29 @Tatoosh: Mach doch mal ein Update deiner XBMC Version. Alternativ kannst Du über http://trac.xbmc.org die aktuelle Version aus dem SVN downloaden. |