How real can a virtual avatar be? It bypassed the bank's protection and fooled real people

May 1, 2023  20:08

How real and how human-like can a digital avatar created by artificial intelligence be? Can it fool the protective mechanisms of banks and other businesses? Can it fool real people?

The Wall Street Journal columnist Joanna Stern tried to find out, and the result was more frightening than she could have imagined.

Using the Synthesia tool, Joanna created her virtual avatar. According to the developers, this tool can create video avatars based on video and audio recordings of real people. And this avatar will repeat whatever text the user types. The startup charges $1,000 a year to create and maintain a virtual avatar.

A 30-minute video of Joanna and nearly two hours of recordings of her voice were used to train the algorithm. Once the avatar was ready, Joanna created text for a TikTok video about iOS using ChatGPT and uploaded it to her avatar, which then created the finished video. According to Joanna, when she watched the video, she thought she saw her reflection in the mirror.

So far, this technology is certainly not perfect. When speaking short sentences, the avatar seems convincing enough, almost like a real person, but when it has to speak longer phrases, it becomes clear that it is not human. Some on TikTok have even noticed this, although the platform's users are thought to be less observant.

A problem was also detected when trying to use a digital avatar during video calls in Google Meet: the avatar always maintains a perfect posture and practically does not move, unlike real people.

However, despite all these problems, the video avatar will soon become more advanced. Synthesia already has several beta versions in development that can nod, raise and lower eyebrows, and perform other human-like movements.

Joanna also tested a voice clone created with ElevenLabs' generative AI algorithm. She downloaded about 90 minutes of recordings of her voice, and in less than two minutes the voice clone was ready. This audio avatar can reproduce any text with the user's voice. ElevenLabs charges starting at $5 per month to create a sound clone.

It turned out that the audio clone is more like a real person at this stage than the video clone. Audio clone speech has intonation and it reproduces the text more smoothly and naturally.

The voice clone also called Joanna's father, asking for her social security number. However, the father quickly noticed that it was not Joanna's voice, but a recording.

The voice clone made another call to Chase Bank support. The algorithm had learned in advance what questions to answer in the voice authentication process of the bank. After a short conversation, the voice clone was connected to the representative of the bank, because the bank's voice recognition system could not distinguish that it was not Joanna who was talking, but her voice clone.

A Chase spokesperson later said the bank uses voice authentication along with other customer authentication tools. He emphasized that voice recognition enables communication with a support service employee, but it cannot be used to carry out a transaction or other financial operation.

To create a voice clone, it is enough to upload several audio recordings to the service and agree to the rules of the platform, according to which the user undertakes not to use the algorithm for the purposes of counterfeiting. But it turns out that anyone can easily create the voice of their friends or famous people.

According to ElevenLabs representatives, the company only allows owners of paid user accounts to clone their voice, and in case of violation of the platform's policy, their account will be blocked. In addition, the developers plan to release a new service that will be able to check any audio recording and find out if the ElevenLabs algorithm was used in their creation.

As a result of this experiment, Joanna came to the conclusion that none of the algorithms she used could yet create a copy that was indistinguishable from the original. ChatGPT created text without the knowledge and experience of a journalist. The service Synthesia has created an avatar that, although similar to a person, still cannot convey all the characteristics of a real person. Finally, the ElevenLabs system produces speech that closely resembles human speech, but it's not perfect either, at least not yet.

But it is not excluded that in the near future, thanks to the development of AI technologies, virtual avatars will appear that cannot be distinguished from real people.


 
 
 
 
  • Archive