You are here:  Home  >  #Alert  >  Current Article

Historic Moment: Microsoft develops first speech recognition system with lowest errors like humans do

By   /  October 20, 2016  /  Comments Off on Historic Moment: Microsoft develops first speech recognition system with lowest errors like humans do

    Print       Email

Redmond/Mumbai:  Microsoft has achieved a historic breakthrough in speech recognition, creating a technology that recognizes the words in a conversation as well as a person does. Now a computer can recognize the words in a conversation as well as a person would with the lowest word error rate as like humans do.

Microsoft informed through its official blog that the team of researchers and engineers in Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists.  The researchers reported a word  error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month. The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task.

“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement.” The milestone means that, for the first time, a computer can recognize the words in a conversation as well as a person would. In doing so, the team has beat a goal they  set less than a year ago — and greatly exceeded everyone else’s expectations as well.

“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group.

The research milestone comes after decades of research in speech recognition, beginning in the early 1970s with DARPA, the U.S. agency tasked with making technology breakthroughs in the interest of national security. Over the decades, most major technology companies and many research organizations joined in the pursuit.

“This accomplishment is the culmination of over twenty years of effort,” said Geoffrey Zweig, who manages the Speech & Dialog research group.

Implications of speech recognition technology: The milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.

“This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

 Not 100% perfection: Microsoft said that the research milestone doesn’t mean the computer recognized every word perfectly. In fact, humans don’t do that, either. Instead, it means that the error rate – or the rate at which the computer misheard a word like “have” for “is” or “a” for “the” – is the same as you’d expect from a person hearing the same conversation.

Zweig attributed the accomplishment to the systematic use of the latest neural network technology in all aspects of the system.

The push that got the researchers over the top was the use of neural language models in which words are represented as continuous vectors in space, and words like “fast” and “quick” are close together.

“This lets the models generalize very well from word to word,” Zweig said.

In the longer term, researchers will focus on ways to teach computers not just to transcribe the acoustic signals that come out of people’s mouths, but instead to understand the words they are saying. That would give the technology the ability to answer questions or take action based on what they are told.

“The next frontier is to move from recognition to understanding,” Zweig said.

Shum has noted that we are moving away from a world where people must understand computers to a world in which computers must understand us. Still, he cautioned, true artificial intelligence is still on the distant horizon.

“It will be much longer, much further down the road until computers can understand the real meaning of what’s being said or shown,” Shum said.

    Print       Email

You might also like...

Arunachal Pradesh’s decision to create school clusters driven by the state’s motive of ensuring quality education to every child in the state

Read More →