![]() |
Clean scraping API - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32) +--- Forum: Kodi Application (https://forum.kodi.tv/forumdisplay.php?fid=93) +---- Forum: GSoC (https://forum.kodi.tv/forumdisplay.php?fid=299) +----- Forum: GSoC 2012 (https://forum.kodi.tv/forumdisplay.php?fid=161) +----- Thread: Clean scraping API (/showthread.php?tid=134012) |
Re: Clean scraping API - queeup - 2013-04-11 Sorry for interrupt. I didn't read all topic but maybe you guys want to check this for some new ideas. https://github.com/wackou/guessit RE: Clean scraping API - topfs2 - 2013-04-11 (2013-04-11, 16:58)queeup Wrote: Sorry for interrupt. I didn't read all topic but maybe you guys want to check this for some new ideas. nice find! I bet there is tons we can borrow from that! Re: Clean scraping API - queeup - 2013-04-11 Good, then I will add one more for video metadata. https://github.com/Diaoul/enzyme RE: Clean scraping API - garbear - 2013-04-11 stop making our lives easier Re: Clean scraping API - queeup - 2013-04-11 Believe me I was waiting this python scraper thing almost two years and finally it's happening. Well done. Bad thing is I saw this topic today. Shame on me :( RE: Clean scraping API - garbear - 2013-04-11 ![]() RE: Clean scraping API - topfs2 - 2013-04-17 Since this thread has gotten so much heat as of lately I want to start a discussion on something I simply need some discussion on ![]() The discussion is regarding issue #7 #9 and semi related is #8. The problem is not really the scheduling algorithms (they would need some love but in essence they should work) but more how to reorganize the API of supplies and demands. Basically what we arrive at IMO is a subgraph find and alteration problem, which we in essence had before but with a single node (subject) and its edge. So what I envision is something along the lines of demands: find A where edge(A, owl.sameAs, B) and (B is URL or edge(B, dc.identifier)) As this would allow for this type of owl.sameAs Code: { But I can't find a nice way to produce the above query in python, and in a pythonic way. I'd love it if the demand and supply API was similair aswell, and provided some validation on the output aswell. ATM a task can state it outputs a certain edge and nothing else but when run it can output anything ![]() ![]() Cheers, Tobias RE: Clean scraping API - garbear - 2013-05-09 (2013-05-09, 19:49)The Movie Database Wrote:Searching is an important tool for a project like TMDb. Without a good search we end up with duplicates, frustrated users and quite frankly a less than stellar experience. Over the past few years we've had a lot of things change, especially with the amount of non-English content that has been added to our database. We've also grown a lot and our old search infrastructure simply wasn't up for the task.From their facebook page: https://www.facebook.com/themoviedb It looks like they've been working heavily on the search issue as well. With a search engine on their end so heavily optimized in the domain of movies, I'm imagining how much thinking we're going to need to put in to actually contribute anything statistically significant to their results. RE: Clean scraping API - TheMonkeyKing - 2013-10-18 Error results on our end. While they have the definitions developed on their end we need the application of terms. Basically we want to sort our false results and possible fixing the erroneous result so it is correct now and remembers the corrected ID. Also, to know when to search and when not to. (2013-05-09, 23:05)garbear Wrote: It looks like they've been working heavily on the search issue as well. With a search engine on their end so heavily optimized in the domain of movies, I'm imagining how much thinking we're going to need to put in to actually contribute anything statistically significant to their results. |