Google DeepMind’s new AI can closely mimic human speech, m

Google DeepMind’s new AI can closely mimic human speech, make piano music

Steve Pak | Sep 11, 2016 09:02 AM EDT

Captain Kirk in 'Star Trek' (Photo : YouTube)

Google's DeepMind team has developed new artificial intelligence (AI) tech called WaveNet that sounds like a real person talking instead of a robotic voice. It upgrades text-to-speech programs that are easy to notice in online videos such as at Google's YouTube site, and can even make new piano music samples. The British company also recently built the computer program AlphaGo that in March defeated one of the world's top Go players in a five-game match.

Like Us on Facebook

Developers now use two methods to build speech programs. In one method a person creates a library of words and speech bits, which makes it difficult to change sounds and the rise/fall of voices.

The other type of speech programs creates digital words based on how they would sound when a person says them. These sounds are easier to tweak but sound more like the talking computer in the original "Star Trek" TV series that is celebrating its 50th anniversary.

Text-to-speech programs are becoming more important to the computing world. People are using several personal assistants including Apple's Siri, Amazon's Alexa, and Microsoft's Cortana. However, the digital assistants' responses are made by combining small pieces of voice recordings.

Google's AI team used a new approach. It recorded audio waveforms recorded by real humans then fed the audio recordings to the neural network. Waveforms are the "shapes" of sounds that dance up and down on some media player displays.

In fact, DeepMind's new tech could also be used in future songs. Google engineers fed classical piano songs to WaveNet, which gave it the tools for creating new music samples, according to Engadget.

Meanwhile, the new technology converts text into a series of phenomes (smallest unit of sound in languages) and syllables that are voiced out.

Google's AI team then tested the new technology. The blind tests' subjects reported that WaveNet's results were more human-like than other text-to-speech programs.

WaveNet's output first includes nonsense sounds with breaths and pauses. However, after Google's tech adds language rules and makes suggestions it sounds like real human speech.

DeepMind stated in a post that the new tech makes computerized voices more than 50 percent closer to real human voices. This was based on English and Mandarin Chinese tests. The average score was 4.21 for English and 4.08 for Mandarin, with a score of 5 being very realistic, according to The Verge.

WaveNet is not ready to be added to the tech company's Google Assistant. However, the British AI company's samples posted on its website show the technology could become a seminal game-changer in future voice assistants.

Here's a video on DeepMind's AlphaGo: