2015-03-23, 09:14
Voice Commands for Kodi.
Motivation: I can never find my remote. Could be used for home automation plugins.
Name: Cole
forum/e-mail: posting email on a public forum is not a good idea. pm me please
Summary: As an alternative to using keyboard or mouse, I would like to add the ability to use voice commands for interface control.
How will I achieve this:
Step 1: Investigate pros/cons of publicly available APIs that can be implemented. Online may be a possibility, but the Google API is limited to 50 calls per day, and having a microphone in the living room that communicates with the outside world makes me want to put on a tin foil hat. Kaldi looks like a good option and looks like it supports GPU acceleration.
Step 2. Identify proposed command set, allow for public review and suggestion
Step 3. Plan and write extensible API
Step 4. Implementation of commands.
Step 5. Implementation of Search
Step 6. Background noise filtering -- filter out currently playing audio.
Step 7. Testing, testing, testing, and more testing.
What will the project focus on: Training will be the most important part. For efficient training, a structured set of commands will need to be used. Based on the training data a model can be built. The model will continue to be refined each time a command is issued -- this is the important part. For the first few iteration of the commands the user experience will probably suck. In the future this can be solved by packaging models with distributions for common languages and dialects. An option would be to have a user 'opt in' to upload the feature vectors back to a Kodi server. Accuracy will also be greatly improved since we can limit our vocabulary to the meta-data in the library.
This will improve on other implementations by using deep neural networks rather than GMMs.
http://static.googleusercontent.com/medi.../41176.pdf
This will also be an "always on" implementation. A listen command such as "OK Kodi" will be issued before the requested action.
Benefits: All users will derive benefits from this project. A particular benefit will be seen by persons with disabilities.
Goals: Basic implementation that is well documented
Requirements: C++, python, microphone, license compatable API
Possible mentors:
Future contributions that can build on this frame work include
1. Implement Kinect/OpenNI sensor support for gestures/occupancy detection/person identification.
2. Subtitles from audio and translation.
I am currently an undergrad CS major. I am currently employed by the Naval Research Lab -- I was kept on as a student contractor after my internship last summer. My work focuses on robotics and interaction between sensors and how their importance changes depending on environment. More specifically I do a lot with RDMS, GIS, computer vision, and learning. I am most comfortable with Python, Java, and C++, but will use the language that best fits the job as long as it is not assembly.
http://www.github.com/colek42
Please note, most of my "good" code is closed source. I am looking to start doing some contributions to the OSS community.
Let me know if you are interested and I will refine the proposal and submit to GSOC
Edited for refinement
Motivation: I can never find my remote. Could be used for home automation plugins.
Name: Cole
forum/e-mail: posting email on a public forum is not a good idea. pm me please
Summary: As an alternative to using keyboard or mouse, I would like to add the ability to use voice commands for interface control.
How will I achieve this:
Step 1: Investigate pros/cons of publicly available APIs that can be implemented. Online may be a possibility, but the Google API is limited to 50 calls per day, and having a microphone in the living room that communicates with the outside world makes me want to put on a tin foil hat. Kaldi looks like a good option and looks like it supports GPU acceleration.
Step 2. Identify proposed command set, allow for public review and suggestion
Step 3. Plan and write extensible API
Step 4. Implementation of commands.
Step 5. Implementation of Search
Step 6. Background noise filtering -- filter out currently playing audio.
Step 7. Testing, testing, testing, and more testing.
What will the project focus on: Training will be the most important part. For efficient training, a structured set of commands will need to be used. Based on the training data a model can be built. The model will continue to be refined each time a command is issued -- this is the important part. For the first few iteration of the commands the user experience will probably suck. In the future this can be solved by packaging models with distributions for common languages and dialects. An option would be to have a user 'opt in' to upload the feature vectors back to a Kodi server. Accuracy will also be greatly improved since we can limit our vocabulary to the meta-data in the library.
This will improve on other implementations by using deep neural networks rather than GMMs.
http://static.googleusercontent.com/medi.../41176.pdf
This will also be an "always on" implementation. A listen command such as "OK Kodi" will be issued before the requested action.
Benefits: All users will derive benefits from this project. A particular benefit will be seen by persons with disabilities.
Goals: Basic implementation that is well documented
Requirements: C++, python, microphone, license compatable API
Possible mentors:
Future contributions that can build on this frame work include
1. Implement Kinect/OpenNI sensor support for gestures/occupancy detection/person identification.
2. Subtitles from audio and translation.
I am currently an undergrad CS major. I am currently employed by the Naval Research Lab -- I was kept on as a student contractor after my internship last summer. My work focuses on robotics and interaction between sensors and how their importance changes depending on environment. More specifically I do a lot with RDMS, GIS, computer vision, and learning. I am most comfortable with Python, Java, and C++, but will use the language that best fits the job as long as it is not assembly.
http://www.github.com/colek42
Please note, most of my "good" code is closed source. I am looking to start doing some contributions to the OSS community.
Let me know if you are interested and I will refine the proposal and submit to GSOC
Edited for refinement