2023-02-02, 14:15
I just came across this article on Mastodon, it is from the New Yorker:
Whispers of A.I.’s Modular Future
And also this video:
Auto-generating subtitles using Whisper
My question is, is there any chance that someone who knows how to create Kodi addons could perhaps create one that could leverage this technology to provide closed captions on live or recorded TV streams and other videos? My thinking on the way this could possibly work is you buffer the incoming video and use the software on the audio to do the captioning, but delay playing the video until the captions are ready (assuming the captioning can be done in more or less real time). I would not mind a 10 to 20 second delay before a stream starts playing if that is what it takes to add the captions. Now I know some will say that all TV shows are supposed to be closed captioned but in the real world that is not always the case, and also ffmpeg (which is used by damn near everything that processes audio and video) is often terrible at preserving closed captions, depending on the source. So this would be another way to caption programs that aren't supplied with closed captions, or where the captions have been lost in translation.
The alternative would be to wait until a program is fully recorded and then try to post-process it to add the captions, but maybe that would be more complicated than trying to do it in real time? I don't know, I am not a programmer. But just thought I'd throw this out there because it would be a great thing to have for hearing impaired people, especially in scenes that had poor microphone placement, or where people whisper or mumble or have a really thick accent (assuming the AI is smart enough to deal with those situations, which I realize it may not be - yet).
Whispers of A.I.’s Modular Future
And also this video:
Auto-generating subtitles using Whisper
My question is, is there any chance that someone who knows how to create Kodi addons could perhaps create one that could leverage this technology to provide closed captions on live or recorded TV streams and other videos? My thinking on the way this could possibly work is you buffer the incoming video and use the software on the audio to do the captioning, but delay playing the video until the captions are ready (assuming the captioning can be done in more or less real time). I would not mind a 10 to 20 second delay before a stream starts playing if that is what it takes to add the captions. Now I know some will say that all TV shows are supposed to be closed captioned but in the real world that is not always the case, and also ffmpeg (which is used by damn near everything that processes audio and video) is often terrible at preserving closed captions, depending on the source. So this would be another way to caption programs that aren't supplied with closed captions, or where the captions have been lost in translation.
The alternative would be to wait until a program is fully recorded and then try to post-process it to add the captions, but maybe that would be more complicated than trying to do it in real time? I don't know, I am not a programmer. But just thought I'd throw this out there because it would be a great thing to have for hearing impaired people, especially in scenes that had poor microphone placement, or where people whisper or mumble or have a really thick accent (assuming the AI is smart enough to deal with those situations, which I realize it may not be - yet).