question is "how am i write regex for spaces into the html code?"
i try to write scraber for "www.beyazperde.mynet.com"
"create search" and "get searh result" part ok.
in "get details" section for director name there is 3 rows
if i use it turns several match (couse there is several smiliar lines in html)
[img=http://pic1.resimupload.com/r5/thumb_123574808.JPG![Full sized picture Image](http://pic1.resimupload.com/r5/thumb_123574808.JPG)
i try to begin with
it' match with "<!-- YONETMEN -->"
but when i add other words
it' match nothing.Couse code with <b> is in lower row in html
The "<br>" is in lower row i think i have use something like " "
![Full sized picture Image](http://pic1.resimupload.com/r2/thumb_945911715.JPG)
Here is my XML would you help me on this puzzle?
i try to write scraber for "www.beyazperde.mynet.com"
"create search" and "get searh result" part ok.
in "get details" section for director name there is 3 rows
Code:
<!-- YONETMEN -->
<br><span class="itembaslik">Yönetmen : </span>
<a href="/kisi/27214" class=turunculine_11_px>[b]Ben Verbong[/b]</a>
Code:
<a href="/kisi/([0-9]*)" class=turunculine_11_px>(.[^<]*)</a>
[img=http://pic1.resimupload.com/r5/thumb_123574808.JPG
i try to begin with
Code:
<\!\-\- YONETMEN \-\->
but when i add other words
Code:
<\!\-\- YONETMEN \-\-> <br><span class=..........
it' match nothing.Couse code with <b> is in lower row in html
The "<br>" is in lower row i think i have use something like " "
Here is my XML would you help me on this puzzle?
Code:
<?xml version="1.0" encoding="iso-8859-9"?>
<scraper framework="10" date="2010-04-16" name="beyazperde" content="movies" thumb="logo_beyazperde.JPG" language="tr">
<NfoUrl dest="3">
<RegExp input="$$1" output="<url>http://beyazperde.mynet.com/hizliarama.asp?keyword=\1</url>" dest="3">
<expression noclean="1"/>
</RegExp>
</NfoUrl>
<CreateSearchUrl SearchStringEncoding="iso-8859-9" dest="3">
<RegExp input="$$1" output="http://beyazperde.mynet.com/hizliarama.asp?keyword=\1" dest="3">
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>
<GetSearchResults dest="8">
<RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="8">
<RegExp input="$$1" output="<entity><title>\2</title><url>http://beyazperde.mynet.com/film/\1/arama/\2</url><id>\2</id></entity> \n" dest="5">
<expression repeat="yes" encode="1,2,3,4"><a href="http://beyazperde.mynet.com/film/([0-9]*)/arama/(.[^<]*)" class="turuncucizgisiz_11_px"><b>(.[^<]*)</b> \(([0-9]*)\)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetSearchResults>
<GetDetails dest="3">
<RegExp input="$$5" output="<details>\1</details>" dest="3">
<!--Title-->
<RegExp input="$$1" output="<title>\1</title>" dest="5">
<expression trim="1" noclean="1">h1 class="baslik_filmadi31">(.[^<]*)</expression>
</RegExp>
<!--Year Film-->
<RegExp input="$$1" output="<year>\1</year>" dest="5+">
<expression>class=turunculine_11_px>([0-9]*)</a>, </expression>
</RegExp>
<!--Director-->
<RegExp input="$$1" output="<director>\1</director>" dest="5+">
<expression><a href="/kisi/([0-9]*)" class=turunculine_11_px>(.[^<]*)</a></expression>
</RegExp>
<!--Runtime Film-->
<RegExp input="$$1" output="<runtime>\1\2\3</runtime>" dest="5+">
<expression><a class=item href="\/arama.asp\?kat=vizyon&keyword=([0-9]*).([0-9]*).([0-9]*)</expression>
</RegExp>
<!--Genre Film-->
<RegExp input="$$6" output="<genre>\1</genre>" dest="5+">
<RegExp input="$$1" output="\2" dest="6">
<expression> <a href="\/arama.asp\?kat=tur&keyword=([0-9]*)" class=turunculine_11_px>(.[^<]*)</expression>
</RegExp>
<RegExp input="$$1" output="\2" dest="6">
<expression>href="\/arama.asp\?kat=alttur&keyword=([0-9]*)" class=turunculine_11_px>(.[^<]*)</expression>
</RegExp>
<expression repeat="yes"/>
</RegExp>
<!--Thumbnail-->
<RegExp input="$$1" output="<thumb>http://beyazperde.mynet.com/images/film/\1-\2.jpg</thumb>" dest="5+">
<expression noclean="1">src="\/images\/film\/([0-9]*)-(.[^<]*)([0-9]*).jpg" width="150" height="200"></expression>
</RegExp>
<!--Actors-->
<RegExp input="$$7" output="<actor><name>\2</name></actor>" dest="5+">
<expression repeat="yes"><a href="\/kisi/([0-9]*)" class=turunculine_11_px style="line-height:15px;">(.[^<]*)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetDetails>
</scraper>