Thursday, November 28, 2013

Animals, Avatars and ASR


Here's a prototype I created last year which combines three technologies I'm interested in. Automatic Speech Recognition (ASR), Avatars and Computer Intelligence. To recognize speech, I used SRI International's EduSpeak Speech Recognition Technology. This version ran locally within Windows and was scriptable in the browser using Javascript. I've since worked with a team at GlobalEnglish which adapted this Speech Recognizer technology to run on the server and therefore didn't require a client install. We use Adobe Flash to record and stream the audio to the server where the recognition takes before returning the recognized speech text to the browser.

I used Avatar technology from SitePal. It's interesting to see people's reaction to this Avatar technology. Some people think it's very cool and others see it as "creepy", also known as the "uncanny valley" hypothesis. The uncanny valley hypothesis, in the field of human aesthetics, holds that when human features look and move almost, but not exactly, like natural human beings, it causes a response of revulsion among human observers.

I suspect that the human race over time and while subjected to more and more "avatars", robots', etc will eventually lose this aversion. Especially as human enhancement technologies become more widely adopted and the line between human and robot becomes blurred.

As for the logic of guessing which animal you are thinking about, I "borrowed" the logic from the website http://www.animalgame.com/ for this. I didn't hack into the site or copy their code or anything like that. I simply set up a little bit of server code to submit requests and "screen scrape" from their website the data I needed. This site didn't offer an api which would have let me more formally leverage their technology and I certainly wouldn't have done this for any production feature. However since this was simply a demonstration project, it seemed like a viable alternative to spending the time creating the game logic from scratch. And I figured that if this prototype ever moved into the production phase it would then be time to buy or develop a proprietary version of Animals.

I originally came across this "guess the animal" game in the 1980's when it was widely available for the early personal computers and typically written in basic. It especially intrigued me because it was my first encounter with Artificial Intelligence running on a personal computer. And provided the information it was given was accurate, it was capable of learning. Over time the program would progress to the point where it could guess any animal you were thinking about.

Unfortunately this prototype didn't support a continuous speech recognition capability. Therefore I had to click a button when I wanted to speak. Of course being able to simply speak to the avatar as you would any human would have been the most natural way to interact with it. 

One of the things I've learned while working with Speech Recognition over the years is the importance of good Speech Recognition UI. You've got to constrain the context such that the user knows the range of responses that are expected. To many, "I'm sorry, I didn't get that" responses can kill any great ASR project.

I look forward to creating more prototypes utilizing these technologies. Actually I have more prototypes to show and will write about them in future posts.

If you're interested in collaborating on a future prototype drop me a line.

Check out this new Text To Sing avatar page at SitePal. Wild!




No comments:

Post a Comment