utf-8 file names and content - Printable Version +- Kodi Community Forum (https://forum.kodi.tv) +-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32) +--- Forum: Add-ons (https://forum.kodi.tv/forumdisplay.php?fid=26) +---- Forum: Python 3 migration (https://forum.kodi.tv/forumdisplay.php?fid=281) +---- Thread: utf-8 file names and content (/showthread.php?tid=366245) |
utf-8 file names and content - fbacher - 2021-12-31 If you need to access a file with a utf-8 path, then you need to explicitly encode the path. If your text is utf-8, then you need to specify the encoding as utf-8: io.open(filename.encode('utf-8'), mode='rt', encoding='utf-8) Normally, Python discovers the filesystem encoding (for filenames) and sets it. However, due to a patch introduced in Kodi 19.2 (https://github.com/xbmc/xbmc/issues/19883) to work around what looks like a nasty Kodi Turkish (and other) string handling problem, the filename encoding is 'ASCII' instead of 'utf-8' (at least on Linux). This means that you have to explicitly specify it (at least until the other bug is fixed). I'm not sure of the behavior of utf-8 filenames on different windows versions or OS's that don't support utf-8 filenames. Most modern systems support utf-8 paths. Failure to specify filename.encode('utf-8') can cause errors about out of range ASCII characters when the filename contains non-ASCII characters Issue 19883 is a cautionary tale about subtle handling of character comparison, etc. in different languages. They don't always obey the rules that we expect. |