Scraping issue - MySQL Data Type is too restrictive
#1
I'm not sure if this warrants a full Bug Report, so I'd like to discuss what I've found. (Similar to this thread)

During a recent update to my library, I came across a movie that just would not scan (for posterity's sake, Star Wars: The Force Awakens). I tried multiple Add-Ons (and different settings for each). No dice.

After digging through the logs, I discovered the issue seemed to be caused by a value that had exceeded its intended column's limitations (myvideos107 -> movies -> c08).

The default "TEXT" Data Type limited the input to 65,535 (if this source is correct) characters, well under the size of the string (83,142 characters). Changing the Data Type to "MEDIUMTEXT" allowed the movie to be added as expected:

ALTER TABLE `myvideos107`.`movie` CHANGE COLUMN `c08` `c08` MEDIUMTEXT NULL DEFAULT NULL;

I would have to really dig through my library to try and find another instance of this issue, but I don't know whether this qualifies as a bug. Certainly something in the minority, but worth mentioning nonetheless.

Due to the popularity of the movie, I believe there are an unusually large number of available art files. Should this be reported to the Add-On maker so that they can limit the string length? I just don't know. Both Universal Movie Scraper and The Movie Database Add-Ons failed to add the movie. Also, I can see where some people would prefer to have the options to choose from...

Thoughts? Will an update to Kodi copy over the new Data Type, or would I have a huge problem on my hands when v18 is ready for mass adoption?

ASSOCIATED LOG ENTRIES
Reply
#2
@wil.fortune

I am a little confused by your post. Firstly, Yes, we need a Debug Log.

You state that you cannot scrape Star Wars. Well I had no problems whatsoever scraping that movie into my library in Kodi v17.6.

What "Multiple Add-ons" are you referring to?

You state that you cannot scrape the movie, then further down you start referring to Artwork. Where is the problem- Artwork or scraping?

The database field is related to Artwork. Are you implying there is artwork with a file name of over 65,000 characters?

Honestly, I have no idea what you expect me to do with 4 lines of a Debug Log. Are our requests for FULL AND UNREDACTED LOGS not clear enough for you.
My Signature
Links to : Official:Forum rules (wiki) | Official:Forum rules/Banned add-ons (wiki) | Debug Log (wiki)
Links to : HOW-TO:Create Music Library (wiki) | HOW-TO:Create_Video_Library (wiki)  ||  Artwork (wiki) | Basic controls (wiki) | Import-export library (wiki) | Movie sets (wiki) | Movie universe (wiki) | NFO files (wiki) | Quick start guide (wiki)
Reply
#3
(2018-10-05, 01:30)Karellen Wrote: wil.fortune
I am a little confused by your post. Firstly, Yes, we need a Debug Log.
I apologize if I did not make myself clear. I am not asking for you to fix my issue, as the change I made myself fixed the issue. I only made one specific change, which addressed the error present in the log (that I included). I could attach a full log if you really think you need it, but it was like 8mb. Seemed pointless to force someone to dig through the whole thing to find what I already found (and fixed).
 
(2018-10-05, 01:30)Karellen Wrote: You state that you cannot scrape Star Wars. Well I had no problems whatsoever scraping that movie into my library in Kodi v17.6.
I'm not mad at Kodi or the Add-On (or any developers involved), I'm here to discuss whether what I've observed warrants a bug report, and if so, where/who I should report the bug to (Kodi/Add-On Dev). This sentence makes me think you are taking this personal for some reason.
 
(2018-10-05, 01:30)Karellen Wrote: What "Multiple Add-ons" are you referring to?

You state that you cannot scrape the movie, then further down you start referring to Artwork. Where is the problem- Artwork or scraping?
As I stated in my initial post, "Both Universal Movie Scraper and The Movie Database Add-Ons failed to add the movie".

Just so that there is no more confusion, I was able to successfully scrape the movie once I made the adjustment to the Data Type of column c08.
 
(2018-10-05, 01:30)Karellen Wrote: The database field is related to Artwork. Are you implying there is artwork with a file name of over 65,000 characters?
I didn't imply anything. I stated directly the size of the string exceeds that of the intended column's Data Type limitation (hence the.. 1406 error, I believe it was?). You can check for yourself in the link at the bottom of my initial post. The string that was meant for column c08 is 83,142 characters long.
 
(2018-10-05, 01:30)Karellen Wrote: Honestly, I have no idea what you expect me to do with 4 lines of a Debug Log. Are our requests for FULL AND UNREDACTED LOGS not clear enough for you.
This is just blatantly confrontational. I meant to open up a dialog, not piss anyone off. I'm not sure what I wrote that upset you so much, but I didn't intend to. Did I post this in the wrong forum? Geez..
Reply
#4
It was not until I read your other thread that I realised you were working with a MySQL setup. (Yes it is in the thread title, but I rarely take note of those).
My Signature
Links to : Official:Forum rules (wiki) | Official:Forum rules/Banned add-ons (wiki) | Debug Log (wiki)
Links to : HOW-TO:Create Music Library (wiki) | HOW-TO:Create_Video_Library (wiki)  ||  Artwork (wiki) | Basic controls (wiki) | Import-export library (wiki) | Movie sets (wiki) | Movie universe (wiki) | NFO files (wiki) | Quick start guide (wiki)
Reply
#5
(2018-10-05, 02:28)wil.fortune Wrote: This is just blatantly confrontational. I meant to open up a dialog, not piss anyone off. I'm not sure what I wrote that upset you so much, but I didn't intend to. Did I post this in the wrong forum? Geez..
As indicated, we prefer not to solve puzzles with only 4 pieces. Errors can start at any moment during Kodi, so a full log file is very much preferable. And it also saves us time and effort because we don't have to ask for the full log file, again... and again...

(2018-10-04, 23:52)wil.fortune Wrote: The default "TEXT" Data Type limited the input to 65,535 (if this source is correct) characters, well under the size of the string (83,142 characters). Changing the Data Type to "MEDIUMTEXT" allowed the movie to be added as expected:
Some (most?) of the database layout still is a souvenir from the old XBMC/Xbox days, back in the days when not everyone was a SQL engineer.

On the other hand... 64K+ of thumbnail URLS is a bit ludicrous.
Field 'c08' in MyVideos112 (Kodi18beta3) is for thumbnail urls, which for 'The force awakens' is currently holding 9,006 bytes for me. How you are exceeding 65,000 bytes at all is beyond me.
Reply
#6
The IMDB websites currently holds some 232 posters alone for your movie... Which is getting out of hand, really.
Kodi shouldn't have to store all of those URL links. Only 1 of them is necessary: the selected one.
Reply
#7
as @Klojum and me spoke about that internally I did a local scrape (no MySQL db) and exported the scraped movie to separated nfo files. This is what I get after Wink

http://termbin.com/y1ct

So there are really a BUNCH of links which seem to be stored, which might not be necessary. We will talk about that internally and will let you know.

Just to refere to the movie: https://www.themoviedb.org/movie/140607-...uage=en-US

If you scroll down you will see that there are 232 posters available.
Reply
#8
@Klojum and @DaVu: I wanted to highlight that the primary reason for MySQL being so slow on larger libraries is that it takes a ton of time to transfer the content of c08 and c20. If you find a way to not include these values in the view, yet provide a mechanism for the fanart and thumbnail selection to work, it would be awesome.
Reply
#9
(2018-10-09, 08:11)AMoo-Miki Wrote: @Klojum and @DaVu: I wanted to highlight that the primary reason for MySQL being so slow on larger libraries is that it takes a ton of time to transfer the content of c08 and c20
Actually, MySQL is hardly ever the slow(er) component in the whole data process. Data is often processed in milliseconds by MySQL or SQLite. A database query that retrieves a couple of thousand records is peanuts.

Currently, a bigger problem is that Kodi also internally sorts most incoming data (if not all) from local or remote SQL databases, regardless of what the database query is. So of course, the less powerful the device running Kodi is, the longer it takes before the data/listings are shown in Kodi. It's something to be looked at for Kodi 19.
Reply
#10
(2018-10-05, 02:39)Karellen Wrote: It was not until I read your other thread that I realised you were working with a MySQL setup. (Yes it is in the thread title, but I rarely take note of those).
No hard feelings. I'm sure my limited post count didn't help matters.
 
(2018-10-05, 10:58)Klojum Wrote: As indicated, we prefer not to solve puzzles with only 4 pieces. Errors can start at any moment during Kodi, so a full log file is very much preferable. And it also saves us time and effort because we don't have to ask for the full log file, again... and again...
Had I been confused about the error, I wouldn't have thought twice about uploading the whole log. Later, it did occur to me that perhaps an earlier error was causing too many textures to be loaded into the string. At the time, however, given that the last poster did appear complete (opening and closing tags intact), it seemed more relevant to focus on the entries that were directly tied to the reported error.
 
(2018-10-05, 10:58)Klojum Wrote: Some (most?) of the database layout still is a souvenir from the old XBMC/Xbox days, back in the days when not everyone was a SQL engineer.
Gotcha. Does that mean there is an overall overhaul in the works?
 
(2018-10-05, 10:58)Klojum Wrote: On the other hand... 64K+ of thumbnail URLS is a bit ludicrous.
I couldn't agree more. I won't pretend to have previewed the options for this specific movie, but in general, knowing the options are there is comforting. I don't always like the "default" poster chosen by a scraper / source.

(edit) What I'm driving at is, if you are going to have options, indirectly setting a hard limit on the number of options (however unintentional) seems a bit counterintuitive. Then again, I don't know what performance impact expanding certain columns may have. MEDIUMTEXT allows for a substantially larger string, but I would hope that would only matter if there were numerous rows taking advantage of the larger space. Meaning a string that would fit within the TEXT Data Type parameters would, I hope, be fetched at the same rate regardless of Data Type. I'm still way too green when it comes to MySQL, so I have no idea.

On the other hand, I can also see where the debate could easily (and justifiably) end with the argument, "Why stop at MEDIUMTEXT? Why not allow even MORE space?" I can't recall off the top of my head, but I want to say MEDIUMTEXT is either something like 8MB or 4GB. For the good of preventing bloated DBs, I could see where enforcing a limit of say, 200 options, would be logical. Scrapers would have to calculate their expected string length as they built a string, and then stop at/before the threshold. I have a little programming experience, so I don't think that would be an unreasonable bit of additional code. Doesn't strike the right cords with me, but I also like parity, and I don't like that column's Data Type currently sticking out among the rest. (/edit)

(edit2) After considering it a bit more, I do believe this is a bug that should be reported to the Add-Ons I tested. They must've never expected so many options to be present, but should have respected the destination's limitations nonetheless. I will look them up and reference this thread when I report the issue. I suppose Kodi could also trim the string before inserting it, but that could become problematic if the string were improperly handled and broke a trailing link. Doesn't seem logical to attack the issue from that angle. (/edit2)
 
(2018-10-05, 10:58)Klojum Wrote: Field 'c08' in MyVideos112 (Kodi18beta3) is for thumbnail urls, which for 'The force awakens' is currently holding 9,006 bytes for me. How you are exceeding 65,000 bytes at all is beyond me. 
Hence my post Wink
 
Since we're sort of on the topic, when performing a scrape / change to a MySQL DB from a Kodi client device, does Kodi only push the change to the DB or sync the whole DB? I'm given to believe the former, but I want to be certain..
Reply
#11
(2018-10-09, 09:21)Klojum Wrote:
(2018-10-09, 08:11)AMoo-Miki Wrote: @Klojum and @DaVu: I wanted to highlight that the primary reason for MySQL being so slow on larger libraries is that it takes a ton of time to transfer the content of c08 and c20
Actually, MySQL is hardly ever the slow(er) component in the whole data process. Data is often processed in milliseconds by MySQL or SQLite. A database query that retrieves a couple of thousand records is peanuts.   
@Klojum that was the point I was making. MySQL is not the slow part; it is the data transfer over the network that is the problem. The volume of data on larger libraries is way too much to be transferred in just a few seconds. Just querying the view using MySQL's CLI takes me 8-10 seconds to get the data, despite the query running in milliseconds. The sorting duration is not comparable to this as I compared the loading time of a local MySQL database to the remote one; local one loaded in just over a full second.

While large libraries have a lot of data, c08 and c20 are the 2 fields that make up about half of the data being transferred on my library; not having them transferred would reduce my load time to 4-5 seconds.

For the best user experience, I would propose loading the necessary fields for the UI and then lazy-load the rest of the data. Such a perception management is happening right now where images are being queried later; the art data is not in the views at all. This wouldn't be too bad as Kodi would just populate the values for the skin's consumption and replace them when complete data is available.

Hope that make sense to you too.

PS this is a big deal for me because of the listitem caching behavior of Kodi that caches the response if it takes longer than a second. Watched status and counters might not be very troubling while dealing with movies but when binge watching TV shows and you don't know which episodes have been watched, becomes a lot of trouble.
Reply
#12
(2018-10-05, 10:58)Klojum Wrote:
(2018-10-05, 02:28)wil.fortune Wrote: Some (most?) of the database layout still is a souvenir from the old XBMC/Xbox days, back in the days when not everyone was a SQL engineer.
 
 As opposed to today, when EVERYBODY thinks they're a SQL engineer? Smile
Reply
#13
?!?
Reply
#14
(2018-11-06, 17:17)Klojum Wrote: ?!?
I don't know what happened to the quoted text. :/
Reply

Logout Mark Read Team Forum Stats Members Help
Scraping issue - MySQL Data Type is too restrictive0