Replacing The Simpsons' Cast With AI
With over 700 episodes, The Simpsons is one of the most popular cartoons worldwide. The show’s beloved characters have been voiced by the same voice actors since 1989 – an incredible 4 decades.
Yet, in early March, we heard a voice not heard in years in a one-off episode: Bart’s teacher, Edna Krabappel. For 25 years, Marcia Wallace had played Edna Krabappel, but she died a few years ago. To bring the character back, animators implemented the use of artificial intelligence technology.
In fact, such AI has been developed rapidly over the past few years. In the case of The Simpsons, prominent voice actor Harry Shearer who plays Mr Burns and Waylon Smithers almost left the show back in May 2015. This has sparked debate of the replaceability of voice actors, as they play a key role in its presence. With regards to technology, it’s been said that The Simpsons could replace its voice cast with AI – an AI that perfectly replicates the iconic voices.
How it Works
Wired investigated this statement. They interviewed Tim McSmythurs, an AI researcher who built an AI trained to mimic a person’s voice. It turns text into audio speech. To mimic a voice, McSmythurs gives 2-3 hours of training data of the person speaking. The model focuses on the different frequencies and phonemes that make up the voice. However, for the AI to generate believable outputs, the model has to be exposed to a wide range of emotions.
Sonantic, a British technology startup, is also pursuing research in the same field. Using similar techniques such as obtaining hours of training data with multiple emotions, their AI model can recreate voices easily. In fact, now technologically advanced models can mimic voices with just 10-20 minutes of recordings rather than 30-50 hours.
AI voice generation technology has found its place in many booming industries – not just animation. Most notably, it’s been implemented in video games for example an open world or mods. Since then, conversing with NPC characters has been possible via a synthesis of voice generation, speech recognition, and a text-to-speech algorithm
Written by Nichapatr (Petch) Lomtakul