2009-04-09, 23:30
Hi,
Just found the Discogs Scraper and run it on my Library.
I tried to add ANV (artist name variations) so it would get more hits when looking for artists. This seems to work quite well.
Here is the code I've modified:
Then I found that the scraper doesn't find artists having "The " prepended. i.e "The Art of Noise" is not found since discogs expects it to be "Art of Noise, The"
I tried to add this, but couldn't get it to work. Maybe somebody can help.
The problem seems to be the blank after "The". I tried:
"[Tt]he (.+)"
"[Tt]he\s(.+)"
"[Tt]he[ ](.+)"
but none of them matched.
When I change it to:
"[Tt]he(.+)" it works but of course \1 has a prepending blank and the resulting string is:" Art of Noise, The".
Any ideas?
Bernd
Just found the Discogs Scraper and run it on my Library.
I tried to add ANV (artist name variations) so it would get more hits when looking for artists. This seems to work quite well.
Here is the code I've modified:
Code:
<GetArtistSearchResults dest="8">
<RegExp input="$$5" output="<results>\1</results>" dest="8">
<!-- artist name variation -->
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.discogs.com\1</url></entity>" dest="5+">
<expression repeat="yes" clear="yes"><a class="rollover_link" href="(/artist[^"]*anv=[^"]*)">(.+)</a></expression>
</RegExp>
<!-- exact match -->
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.discogs.com\1</url></entity>" dest="5+">
<expression repeat="yes" clear="no"><a class="rollover_link" href="(/artist[^"]*)"><span style="font-size:11pt;"><em>([^<]*)<</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetArtistSearchResults>
Then I found that the scraper doesn't find artists having "The " prepended. i.e "The Art of Noise" is not found since discogs expects it to be "Art of Noise, The"
I tried to add this, but couldn't get it to work. Maybe somebody can help.
Code:
<CreateArtistSearchUrl dest="3">
<RegExp input="$$2" output="http://www.discogs.com/search?type=artists&q="\1"&btn=Search" dest="3">
<RegExp input="$$2" output="\1,%20The" dest="2">
<RegExp input="$$1" output="\1" dest="2">
<expression noclean="1"/>
</RegExp>
<expression noclean="1" clear="no" repeat="no" trim="1">[Tt]he[ ](.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateArtistSearchUrl>
The problem seems to be the blank after "The". I tried:
"[Tt]he (.+)"
"[Tt]he\s(.+)"
"[Tt]he[ ](.+)"
but none of them matched.
When I change it to:
"[Tt]he(.+)" it works but of course \1 has a prepending blank and the resulting string is:" Art of Noise, The".
Any ideas?
Bernd