2008-05-28, 20:30
Hi guys,
im currently developing around a http://filmstarts.de scraper. After hours I managed to get the XBMC to recognize the search results of filmstarts, but when i try to get the details page, XBMC requests a empty url. I think there is a failure in my RegExp Code
And here we go:
The Filmstarts-HTMl looks like:
[HTML]
<li><a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">
</span>
<span class="t">Der Fluch von Darkness Falls</span>
<span class="g">Teenie-Horror</span> </a></li>
<li><a href="/kritiken/36232-Fluch-der-Karibik.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/fluchderkaribik-poster1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/910er.gif" alt="Wertung: 9 / 10">
</span>
<span class="t">Fluch der Karibik</span>
<span class="g">Abenteuer</span> </a></li>
<li><a href="/kritiken/37419-Blueberry-und-der-Fluch-der-D%E4monen.html">
<img alt="Blueberry und der Fluch der Dämonen" src="/designs/default//images/no_film_small.gif" height="44" width="30">
<span class="r"> <img src="/designs/default/images/ratings/610er.gif" alt="Wertung: 6 / 10">
</span>
<span class="t">Blueberry und der Fluch der Dämonen</span>
<span class="g">Fantasy-Action</span> </a></li>
[/HTML]
and my RegExp is:
(I know the entities aren't converted, but I decoded them for better understanding)
The most important line is:
to recognize
[HTML]
<a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">
</span>
<span class="t">Der Fluch von Darkness Falls</span>
[/HTML]
The only thing XBMC does is to request "/"
Can somebody may help me?
im currently developing around a http://filmstarts.de scraper. After hours I managed to get the XBMC to recognize the search results of filmstarts, but when i try to get the details page, XBMC requests a empty url. I think there is a failure in my RegExp Code
And here we go:
The Filmstarts-HTMl looks like:
[HTML]
<li><a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">
</span>
<span class="t">Der Fluch von Darkness Falls</span>
<span class="g">Teenie-Horror</span> </a></li>
<li><a href="/kritiken/36232-Fluch-der-Karibik.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/fluchderkaribik-poster1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/910er.gif" alt="Wertung: 9 / 10">
</span>
<span class="t">Fluch der Karibik</span>
<span class="g">Abenteuer</span> </a></li>
<li><a href="/kritiken/37419-Blueberry-und-der-Fluch-der-D%E4monen.html">
<img alt="Blueberry und der Fluch der Dämonen" src="/designs/default//images/no_film_small.gif" height="44" width="30">
<span class="r"> <img src="/designs/default/images/ratings/610er.gif" alt="Wertung: 6 / 10">
</span>
<span class="t">Blueberry und der Fluch der Dämonen</span>
<span class="g">Fantasy-Action</span> </a></li>
[/HTML]
and my RegExp is:
Code:
<GetSearchResults dest="3">
<RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="3">
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.filmstarts.de/\1</url><id>\1</id></entity>" dest="5">
<expression repeat="yes"><a href="/kritiken/([-.%\w]+)">[^<]|[\n]<span class="t">([-%. \w]+)</span></expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetSearchResults>
(I know the entities aren't converted, but I decoded them for better understanding)
The most important line is:
Code:
<expression repeat="yes"><a href="/kritiken/([-.%\w]+)">[^<]|[\n]<span class="t">([-%. \w]+)</span></expression>
to recognize
[HTML]
<a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">
</span>
<span class="t">Der Fluch von Darkness Falls</span>
[/HTML]
The only thing XBMC does is to request "/"
Can somebody may help me?