Speech and Language...from Amazon to Toys and UAV's

Speech and Language.. from Amazon to Toys and UAV's


By Jeff Adams

April 26, 2017


In 2013, I was leading the top-secret speech and language group at Amazon, busy building what we called “Project Doppler”, and which would eventually be unveiled as the Amazon Echo.  Occasionally, others at Amazon would learn that we had a top-notch internal speech technology group and would approach me to help them incorporate some speech technology into some other product at Amazon, or in some internal project.  Each time, I had to tell them that I was unable to help them, but that I was sure they could find someone “out there” to help them.

When I finally realized there was no company out there that was in the business of helping companies build novel speech and language products and technologies, it hit me - this is my next adventure.  I founded Cobalt Speech and Language in 2014.  In the years since, Cobalt has developed world-class technology for recognizing speech, classifying speaker characteristics, carrying on natural dialogs, and many other capabilities.  Our technology is now in toys, unmanned aerial vehicles, phones, and large call centers, to name just a few.


Case Study 1: Vista Higher Learning

When Vista Higher Learning, a leading publisher of language-learning textbooks, wanted to develop an online tool as part of their Portales offering to help students practice speaking a new language, and to give them feedback on their pronunciations, they reached out to Cobalt.  We jointly collected recordings from students and teachers, along with detailed labels to know which examples of speech were judged to be spoken correctly.  In the last year, this new feature has analyzed more than 2,000,000 examples of speech spoken by students learning Spanish and French.

“When we released VASR [Verification / Automatic Speech Recognition] this summer,” said Kurt Gerdenich, Chief Technology Officer for Vista Higher Learning, “we knew we were providing students with educational learning technologies unlike anything available today. Still, no one could predict that so many students would benefit from our advanced speech recognition so quickly.”


Case Study 2: AgVoice

When Bruce Rasa, CEO and founder of AgVoice, set out to develop a new spoken assistant for agriculture workers, he knew he needed a partner, not just a vendor.  He found that partner in Cobalt Speech and Language.  Together, Cobalt and AgVoice have built an assistant with a natural language interface, running locally on the user’s smartphone, which is robust to the environment found on a modern farm.  The user keeps their hands free to take measurements, inspect crops, care for animals, and report information or make queries to the interface.


Case Study 3: VocalZoom

VocalZoom have developed a truly novel “microphone” that uses an optical sensor to measure vibrations in the speaker’s cheek and neck while they are speaking.  This optical microphone is relatively immune to ambient noise, but doesn’t have the same kind of frequency response that a normal acoustic microphone has.  As a result, the signal from the optical microphone can’t simply be fed to a normal speech recognizer.  The signal is also limited to lower frequencies.  VocalZoom needed to find a partner to help them develop speech recognition technology for this new microphone, and they found that partner in Cobalt Speech and Language.  We developed a novel speech recognition model and algorithm, based on the latest deep learning approaches, that incorporates both an optical and acoustic input signal and produces results that are much more accurate than are possible with either signal alone.


How Cobalt Works

Cobalt is a unique company.  Our speech team consists of a dozen highly-experienced speech and language scientists and engineers.  We have no central office; each of us works from home.  We use chat and video to collaborate, and we get together a few times each year for our “Cobalt Workshops”.  Each customer is assigned a primary technical point of contact within Cobalt so they have direct access to the pool of experts, and through that contact, they have access to the whole body of experience at the company.

We believe it’s important to let our technology be driven by the needs of our customers, so we try not to have a strong, independent agenda of our own.  We work on the things our customers ask for.  We have developed speech recognition for English, Spanish, French, and Portuguese because that’s what our customers have asked for.  We will develop other languages when they are needed.  We have developed (with our partner Canary Speech) tools for detecting certain diseases and analyzing mental states from speech because that’s what our customers wanted.

Cobalt is project-oriented.  When a customer asks for a new project, we come up with an estimate of effort to build the required technology.  In an increasing number of cases, that technology already exists in our portfolio, and we require little or no effort to customize or build anything.  If something has to be built, we generate a quote for that effort, and some corresponding license fee for ongoing use.  In cases where the required work is viewed as having ongoing strategic value (such as in the case of developing a new language), the up-front engineering costs might take the form of a prepayment of later licensing fees.

 


The Evolution of Cobalt

We at Cobalt are never standing still.  Our customers constantly take us on adventures into new domains, and we love the journeys.  Along the way, we have developed speech recognition and other technology that is competitive with (and often outperforms) other offerings in the field.  An increasing number of customers are simply licensing this very competitive technology at very competitive rates.

We’re confident about a few things that the future holds.  We will extend our speech and language capabilities to new languages.  We will further improve the accuracy and performance of our engines and tools.  Straightforward licensing deals will continue to play an increasing role, complementing our traditional custom project work.

But the things in the future that excite me the most are the projects we don’t yet know about and haven’t even imagined.  If the past is an indicator of the future, we will continue to be surprised by the creative proposals and questions that our customers will bring us, and we will continue to take on those projects, and grow our technology portfolio in ways that nobody today can anticipate.


 

Speech and Language