Experiment: How to add icu4c libs to Kodi build
#1
Some outstanding Unicode issues are presenting serious problems for my new addon as well as fixing youtube-dl/ytdlp.

I have been looking a bit at some of the Unicode problems (such as the "Turkish I" problem, issue 19883). I am experimenting creating an api utilizing the icu4c library. I have built an api and icu4c outside of Kodi, but now I'm trying to figure out how to add icu4c libs to the Kodi build so that I can further experiment. As a minimum, I need to add the lib path to my hand built icu4c libs to the build. I'm sure that I can figure it out after a few days effort, but some hints would be appreciated.

I have done some experiments using the icu4c library to do case-folding, etc. I have created a simple xbmc api (xbmc.utf8_fold) so that I can use it from addons. In particular, I want to see how I can use it (or something like it) to do caseless comparison on file names, to comply with Kodi rules. I next want to use it to create case-folded (more or less lower-case) keys for settings, etc. I don't know if icu4c is the way to go or not. It is robust and has capabilities that I think Kodi needs but are not yet in c++ nor Python. I think it is likely to be better to go with an existing solution rather than reinvent the proverbial wheel.
Reply
#2
see https://github.com/xbmc/xbmc/pull/17833
Reply
#3
Awesome! I'll take a look.
Reply
#4
My understanding is that PR is related to Android not providing native C implementation of locales, so collation needs to access Android java icu library via ndk?

scott s.
.
Reply
#5
I was looking for example on how to integrate it. Due to the nature of the Android platform, the example is not that helpful to me, but it gave me some things to ponder.

So far I have started playing with converting all StringUtils to use icu4c functions (string comparision, copy, etc.). It is crude and inefficient. I want to see if I can make some of the problems turned up by the 'Turkish I' issue to go away and therefore identify things that need fixing.

There is much that I don't know and haven't yet sought advice about, mostly because I'm busy learning the code and build system (and C++). I don't know if all, most or few strings in Kodi are Unicode and how to tell which ones should or should not be. Personally I think it leads to madness to not do everything in Unicode, other than very special cases. I may be completely reading the code wrong, but it looks like that at least some of the code ignores that unicode can be more than one byte; that tolower (or casefold) can result in different number of characters and bytes than the original string; that in the general case, strings must be normalized for comparision, etc... Of course, depending upon the usage of the characters and the constraints placed on them, then some of these steps can most likely be skipped.

I am proceeding with my experiment and of course hitting all sorts of unrelated problems. I can build, run and debug with libicu4c, but in order to debug memory issues I switched to clang. Then I found that I had to upgrade libs which were not available in Ubuntu 20.10, so I upgraded to 21.10. 21.10 default is wayland, so I spent a lot of time going down that path, which involved getting unreleased libwayland-client, but apparently more things need to be upgraded I haven't yet found the magic list of build dependencies. I hoped to find the build dependencies on the Jenkins Build machine, but not yet so far...

Anyway, I'm now focusing on building without wayland support and hope to be back to debugging my code soon. Upgrading and learning new build systems, tools, etc. is always a fire-hose sucking experience.
Reply

Logout Mark Read Team Forum Stats Members Help
Experiment: How to add icu4c libs to Kodi build0