Is a human being speaking behind the camera or an artificial intelligence clone? A surprising innovation from an Nvidia-backed unicorn startup makes it nearly impossible to tell the difference.
AI startup Synthesia, which reached unicorn status last year with a billion-dollar valuation, released a new technology called Expressive Avatars on Thursday; the world’s first digital AI clones capable of producing human facial expressions and the right tone of voice from written instructions.
The technology starts with an AI avatar, which can be customized to reflect real faces.
Photo credit: Sintesi
Artificial intelligence creates a digital copy of a person based on footage recorded via its own webcam or in a certified studio. It can also clone the person’s voice to infuse their digital likeness.
Those who are wary of creating an AI avatar that takes on their face and voice can opt instead for one of the more than 160 pre-loaded AI avatars that Synthesia has in its database.
Related: ‘This is a serious problem’: Mr. Beast slams AI deepfakes
Once a user creates or selects an AI avatar, they only need to do one more thing: write what they want their digital self to say.
In a demo seen by CNBC, one user wrote, “I’m happy. I’m sad. I’m frustrated.” and the digital clone generated by artificial intelligence read the text. The avatar conveyed facial expressions and tone associated with happiness when saying the text message “I’m happy” and changed its inflection appropriately when saying “I’m frustrated.” The tone matched the words.
With an AI clone and a written message, a free user can generate 36 minutes of personalized videos in more than 120 languages every year. Paid plans go up to $67 per month for up to 360 minutes of video per year, or unlimited minutes of video for businesses that opt for an enterprise plan.
Synthesia is a startup that large companies use behind the scenes. Zoom, Xerox, Microsoft and Reuters all use Synthesia programs internally. Synthesia CEO Victor Riparbelli told MIT Technology Review that 56% of Fortune 100 companies use this technology.
Synthesia markets the technology as a way to create expressive digital avatars for training courses and corporate presentations. For example, Zoom designers created sales training videos in Synthesia in 90% less time than it took humans to create the videos.
Related: JPMorgan says its AI cash flow software has reduced human labor by nearly 90%
“Zoom subject matter experts no longer need to sign up, freeing up 15-20 hours each month to work on their actual work,” Synthesia’s website reads.
However, the ability to create scary deepfakes, or an AI that clones and manipulates voices, likenesses, or other aspects of a human without their permission, can lead to misuse.
Last month, Tennessee became the first US state to pass legislation protecting music industry professionals from deepfakes.