Hi there,
After many, many years enjoying XBMC I think I can contribute with something that some Spanish talking guys will like.
I have created (and still working on it) a scraper for the Spanish cinema page http://www.culturalia.net
Here you can get Spanish thumbs, plots and all other fields included in XBMC database.
I have contacted Culturalia.net's administrator asking for the permission to make this script public.
Until that happens (hopefully soon enough) who ever wants to try the scraper must contact me and ask to be a part of the beta testing group.
Before going into details on the features of the scraper let me state something here.
This scraper is a one full day work starting with reading about scrapers, xml, regex (haven't used them in my life) so don't spec this to be perfect. Don't even spec it to work at all... In this way, anything you get will be very welcome.
Of course the coding will need much improving as this is the first attempt on doing something useful with xml in my life.
But the good news is that I learn fast and I have a lot of spear time. So spec improvements. I have taken a lot from XBMC. Now it is my time to give back...
That being said... The actual version of the scraper (I call it internally beta1) can get data from culturalia.net for the following fields:
<Title> - Because I like it this way I have entered here "Title (OriginalTitle)" because I cannot see the original title in any other way
<Originaltitle> - I didn't find this in the documentation (http://www.xboxmediacenter.com/wiki/inde...craper.xml) but in one post over here so I thought it might be interesting for the future
<year>
<director>
<Credits> - This is the writer information - just the first one. I think XBMC only can handle one
<plot>
<mpaa> - I'm entering here the information about who is allowed to see this film. Because I don't see this information anywhere I have also entered this info in the <outline> field (as culturalia has no info for that and I didn't want to repeat the long plot field. This probably will be taken out after the beta state is gone.
<outline> - see <mpaa>
<runtime> - in minutes
<rating> and <votes> - This info is very similar in culturalia to the one implemented in IMDB so it fits quite fine. During beta testing, due to some problems I'm having (see "know bugs") these fields are replicated in <tagline>. In the final version this will go out.
<tagline> - there is no field in culturalia.net for this field (it is not very common in Spanish advertising) see <rating> and <votes> for current use during beta testing.
<genre> - finally Spanish genres (I've been waiting for this a loooong time)
<thumb> - Spanish poster (first time available on XBMC). Thanks Culturalia for this...
<actor> - list of actors. There is no role field in culturalia, so no role info is filled in.
Nothing else for the moment. But I think is enough for starters as it takes all visible fields with current XBMC versions.
KNOWN BUGS
Of course when you start a new program you also start al list of bugs. And this won't be the exception. Some of the bugs listed here might be caused by the build of XBMC that I'm using (rev7767) so I'm trying to update as I write (and you read). As this moment this are the know bugs for culturalia_beta1.xml:
1. <Votes> field is showing wrong numbers even though I have checked I'm parsing correctly the field in culturalia's page (you can see it in the <tagline> field). Might be a problem with my XBMC build as this happens also with the IMDB scraper.
2. Problem with Spanish special characters (like in French we use more that the ascii coded characters) I have not much experience with page codes in XML. But I have tried to set encoding to UTF-8 in the xml header. I don't know if this is supposed to work. But according to other post for the filmweb scraper that had the same problem (and according to spiff) this might be solved in rev7829 and above... so again will see when I can update my XBMC.
Anyway it is weird that Spanish characters show perfectly fine in Genre field... Anyway I haven't got much into it... We'll see after the update.
MISING/INCOMPLETE
1. I have done nothing with the nfourl as I'm not using nfo files for this. But I guess I will put it in when someone asks.
2. Any other fields not listed here (documented or not) are, of course, also missing. Again no problem to add anything when requested and if the data is available in culturalia
What I'm not going to do (for the moment) is take petitions to convert this to other sites... at least for the moment.
And that is all for the moment. Now I have to get as much people as possible to test it and fix what they will surely find.
Hope I get a positive response from Culturalia ASAP to make the beta available for everybody.
Thanks for taking the time to read this and don't hesitate to contact me if you want to test this.
Finally I want to apologize for my pour English. But that is why I needed this scraper, isn't it?
Best regards,
Jurrabi.
After many, many years enjoying XBMC I think I can contribute with something that some Spanish talking guys will like.
I have created (and still working on it) a scraper for the Spanish cinema page http://www.culturalia.net
Here you can get Spanish thumbs, plots and all other fields included in XBMC database.
I have contacted Culturalia.net's administrator asking for the permission to make this script public.
Until that happens (hopefully soon enough) who ever wants to try the scraper must contact me and ask to be a part of the beta testing group.
Before going into details on the features of the scraper let me state something here.
This scraper is a one full day work starting with reading about scrapers, xml, regex (haven't used them in my life) so don't spec this to be perfect. Don't even spec it to work at all... In this way, anything you get will be very welcome.
Of course the coding will need much improving as this is the first attempt on doing something useful with xml in my life.
But the good news is that I learn fast and I have a lot of spear time. So spec improvements. I have taken a lot from XBMC. Now it is my time to give back...
That being said... The actual version of the scraper (I call it internally beta1) can get data from culturalia.net for the following fields:
<Title> - Because I like it this way I have entered here "Title (OriginalTitle)" because I cannot see the original title in any other way
<Originaltitle> - I didn't find this in the documentation (http://www.xboxmediacenter.com/wiki/inde...craper.xml) but in one post over here so I thought it might be interesting for the future
<year>
<director>
<Credits> - This is the writer information - just the first one. I think XBMC only can handle one
<plot>
<mpaa> - I'm entering here the information about who is allowed to see this film. Because I don't see this information anywhere I have also entered this info in the <outline> field (as culturalia has no info for that and I didn't want to repeat the long plot field. This probably will be taken out after the beta state is gone.
<outline> - see <mpaa>
<runtime> - in minutes
<rating> and <votes> - This info is very similar in culturalia to the one implemented in IMDB so it fits quite fine. During beta testing, due to some problems I'm having (see "know bugs") these fields are replicated in <tagline>. In the final version this will go out.
<tagline> - there is no field in culturalia.net for this field (it is not very common in Spanish advertising) see <rating> and <votes> for current use during beta testing.
<genre> - finally Spanish genres (I've been waiting for this a loooong time)
<thumb> - Spanish poster (first time available on XBMC). Thanks Culturalia for this...
<actor> - list of actors. There is no role field in culturalia, so no role info is filled in.
Nothing else for the moment. But I think is enough for starters as it takes all visible fields with current XBMC versions.
KNOWN BUGS
Of course when you start a new program you also start al list of bugs. And this won't be the exception. Some of the bugs listed here might be caused by the build of XBMC that I'm using (rev7767) so I'm trying to update as I write (and you read). As this moment this are the know bugs for culturalia_beta1.xml:
1. <Votes> field is showing wrong numbers even though I have checked I'm parsing correctly the field in culturalia's page (you can see it in the <tagline> field). Might be a problem with my XBMC build as this happens also with the IMDB scraper.
2. Problem with Spanish special characters (like in French we use more that the ascii coded characters) I have not much experience with page codes in XML. But I have tried to set encoding to UTF-8 in the xml header. I don't know if this is supposed to work. But according to other post for the filmweb scraper that had the same problem (and according to spiff) this might be solved in rev7829 and above... so again will see when I can update my XBMC.
Anyway it is weird that Spanish characters show perfectly fine in Genre field... Anyway I haven't got much into it... We'll see after the update.
MISING/INCOMPLETE
1. I have done nothing with the nfourl as I'm not using nfo files for this. But I guess I will put it in when someone asks.
2. Any other fields not listed here (documented or not) are, of course, also missing. Again no problem to add anything when requested and if the data is available in culturalia
What I'm not going to do (for the moment) is take petitions to convert this to other sites... at least for the moment.
And that is all for the moment. Now I have to get as much people as possible to test it and fix what they will surely find.
Hope I get a positive response from Culturalia ASAP to make the beta available for everybody.
Thanks for taking the time to read this and don't hesitate to contact me if you want to test this.
Finally I want to apologize for my pour English. But that is why I needed this scraper, isn't it?
Best regards,
Jurrabi.