2007-03-23, 16:25
I'm working on a TV Show scraper for allocine.fr.
I'm down to the episode list, but I have a little problem:
I use the scrap.exe tool to test it, and when the tool get the links for the episode list, there is a "&" sign that gets lost, let me show you:
this is the output of the scrap.exe tool.
You can see that in the <details> tag the URL are OK :
but when the tool says "Episodelist URL" the & sign is lost in the link, causing a near empty page on the website.
and here is the code from the scraper.xml
I tried replacing the $amp; with only &, i tried putting it twice (&& and &&) the & sign never shows up. but when i try to change the & with " the " sign appears where I need it, only the & that doesnt seems to work.
any help would be appreciated.
The_Dogg
I'm down to the episode list, but I have a little problem:
I use the scrap.exe tool to test it, and when the tool get the links for the episode list, there is a "&" sign that gets lost, let me show you:
Code:
</status><premiered>
7 Ao¹t 2005</premiered><episodeguide><url>http://www.allocine.fr/series/episodes_gen_csaison=1511&cserie=513.html</url>
<url>http://www.allocine.fr/series/episodes_gen_csaison=2450&cserie=513.html</url></episodeguide></details>
Episodelist URL 1:http://www.allocine.fr/series/episodes_gen_csaison=1511cserie=513.html
Episodelist URL 2:http://www.allocine.fr/series/episodes_gen_csaison=2450cserie=513.html
GetEpisodeListInternal 2 returned :
GetEpisodeList returned :
Error: Unable to parse episodelist.xml
this is the output of the scrap.exe tool.
You can see that in the <details> tag the URL are OK :
Code:
<url>http://www.allocine.fr/series/episodes_gen_csaison=1511&cserie=513.html</url>
Code:
Episodelist URL 1:http://www.allocine.fr/series/episodes_gen_csaison=1511cserie=513.html
and here is the code from the scraper.xml
Code:
<RegExp input="$$8" output="<episodeguide>\1</episodeguide>" dest="5+">
<RegExp input="$$2" output="<url>http://www.allocine.fr/series/episodes_gen_csaison=\1&cserie=$$4.html</url>" dest="8">
<expression repeat="yes">"/series/casting_gen_csaison=([0-9]*)&cserie=$$4.html" class="link1">[0-9]</a></expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
I tried replacing the $amp; with only &, i tried putting it twice (&& and &&) the & sign never shows up. but when i try to change the & with " the " sign appears where I need it, only the & that doesnt seems to work.
any help would be appreciated.
The_Dogg