AdultDVDEmpire Problem - Bleckshire - 2008-11-27
I've been trying to create a scraper for It'd be my first one but I've been reading up for a bit so I'm pretty sure I've got the hang of it. I can't figure out why I can't get this site to work. For whatever reason it won't pull any search results so I can't even get past that point with writing the script. I originally thought it needed to be spoofed so I tried that and no go. Then I thought I was writing the script incorrectly so I changed up a few things to try it on a different site and that site pulled results just fine so I'm pretty sure it's nothing on my end (hopefully). Anyone think they could help me out?
- artik - 2008-11-29
this one doesn't work ? :
Quote:<!-- Basen On Scraper for Jadedvideo -->
<scraper name="DVD Empire" content="movies" thumb="dvdempire.jpg">
<NfoUrl dest="3">
<!--Don't know what this does but it's on the jadedvideo script so I included it!-->
<RegExp input="$$1" output="\1" dest="3">
<expression noclean="1">[0-9]*)</expression>
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="\1& view=0&display_pic=0" dest="3">
<expression noclean="1"></expression>
<GetSearchResults dest="8">
<RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="8">
<RegExp input="$$1" output="<entity><title>\2</title><url>\1</url></entity>" dest="5">
<!-- This part seems to be working without a problem, the only weird thing is that it does not return as many results as the web site does
The results i'm looking for have the string SearchStrID
Example: <a href="/Exec/v1_item.asp?userid=99365730795583&item_id=1235858& searchStrID=2629">Lost</a>
<expression repeat="yes"><a href="/Exec/v1_item\.asp\?userid=[0-9]*&item_id=([0-9]*)&searchStrID=[0-9]*">(.[^<]*)</a></expression>
<expression noclean="1"></expression>
<GetDetails dest="3">
<RegExp input="$$5" output="<details>\1</details>" dest="3">
<RegExp input="$$1" output="<title>\1</title>" dest="5">
<expression noclean="1" trim="1"><title>Adult DVD Empire - ([^/%]*) -</expression>
<RegExp input="$$1" output="<actor><name>\1</name></actor>" dest="5+">
<expression repeat="yes">/Exec/v1_list_performer\.asp\?userid=[0-9]*&cast_id=[0-9]*&sort=[0-9]'>(.[^<]*)</expression>
<RegExp input="$$1" output="<director>\1</director>" dest="5+">
<expression noclean="1" repeaat="yes">/Exec/v1_list_director\.asp\?userid=[^>]*>([^<]*)</expression>
<RegExp input="$$1" output="<rating>\1</rating>" dest="5+">
<expression>Overall Rating.[^>]*>.[^>]*>([0-9.]+) out[^<]*<</expression>
<RegExp input="$$1" output="<genre>Adult / \1</genre>" dest="5+">
<expression>Category</b>\: <nobr>[^>]*>([^<]*)</expression>
<RegExp input="$$1" output="<tagline>\1</tagline>" dest="5+">
<expression><td valign="top" class="fontsmall3">[^>]*>([^<]*)</i></expression>
<!--I had a really hard time trying to clean the plot. I was in fact not able to do it. They have randomly
added the letter "i" where there should be an space but you can't see the letter because it is the same
color as the background. Afte cleaning that you should be able to clean up the rest of the tags. Another
problem is that sometimes thre is no tags at all, just text so the "clena up search" has to be conditional.
this is an example:
<i>Experience A Place Where Your Wildest Fantasies...Are Only The Beginning</i><br><br>
From the<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>award-winning
director of <I>Pirates</i>, comes <I>Island Fever 4</i>. Shot entirely
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>in HD on
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>location
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>in the
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>Bahamas and
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>Bora Bora, this special
triple-disc set includes 16 sex scenes,<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i
</font>a dozen steamy solo sequences, and<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i
</font>one of the<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>most intense all-girl
orgies ever captured. Packed with over 3 hours of extreme erotic action, this unforgettable production also
includes an additional 2 hours of bonus material and<font face="verdana, arial, sans-serif"
size="-1" color="#ffffff">i</font>special features.<br><br>
See the<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>trailer
<font face="verdana, arial, sans-serif" size="-1" color="#ffffff">i</font>for
<a href=""target="new">Island Fever 4</a>!<br><br>-->
<RegExp input="$$8" output="<plot>\1</plot>" dest="5+">
<RegExp input="$$6" output="\1" dest="8">
<RegExp input="$$1" output="\1" dest="6">
<expression noclean="1"><td valign="top" class="fontsmall3">.([^\%]*)</td></expression>
<expression repeat="yes" noclean="1">([^<]*)<[^>]*></expression>
<RegExp input="$$1" output="<year>\1</year>" dest="5+">
<expression>Production Year:[^/]*/font>([0-9]+)</expression>
<RegExp input="$$1" output="<mpaa>\1</mpaa>" dest="5+">
<RegExp input="$$1" output="<runtime>\1 \2 \3 \4</runtime>" dest="5+">
<!--Thumb Front-->
<RegExp input="$$1" output="<thumb>\1/\2h.jpg</thumb>" dest="5+">
<!--Thumb Back-->
<RegExp input="$$1" output="<thumb>\1/\2bh.jpg</thumb>" dest="5+">
<expression noclean="1"></expression>
- artik - 2008-11-29
Oops, after testing, i'm not sure it's working. Also, there are few problems in the code :S (ADE maybe change their html / php codes, and this scraper hasn't been updated)
- Bleckshire - 2008-11-29
Yeah, that one doesn't work due to a few mistakes in the code as well as the fact that AdultDVDEmpire did indeed change it's coding and that one is based on the old code.
- artik - 2008-11-29
Bleckshire Wrote:Yeah, that one doesn't work due to a few mistakes in the code as well as the fact that AdultDVDEmpire did indeed change it's coding and that one is based on the old code.
Bleckshire, Have you seen my topic on Excalibur ? We can get much better results than dvdempire. (for exemple, thmbnails are in 430 x 600 pixels, without watermarks )
Look my topic here, maybe we can work on it ...
- Bleckshire - 2008-11-30
Yeah, I've seen it. I'm going to fiddle with that code and see how much I can get working with that. I'm still going to try and figure out ADE and create one for that (I like that they categorize titles into multiple genre's as well as a few other features they have).
- artik - 2008-11-30
Bleckshire Wrote:Yeah, I've seen it. I'm going to fiddle with that code and see how much I can get working with that. I'm still going to try and figure out ADE and create one for that (I like that they categorize titles into multiple genre's as well as a few other features they have).
Great. I tried a little bit the Excalibur database. about 98% af my movies has been found. amazing ! this is a perfect point.
ADE still got watermark, and lower thumb resolution ... bad
- Bleckshire - 2008-12-01
artik Wrote:Great. I tried a little bit the Excalibur database. about 98% af my movies has been found. amazing ! this is a perfect point.
ADE still got watermark, and lower thumb resolution ... bad ![Wink Wink](
True, but it works for me personally. I've already got hq covers for all of my releases so I actually don't even let the scrapers set thumbs for me.