As AI blurs the boundary between real and synthetic voices, researchers are innovating technologies to distinguish human-based speech from AI-generated media.
The GAME School hosted the Proof of Presence: Grounding Media in the Human Body in the age of AI seminar on Friday. At the seminar, Visar Berisha, associate dean of research and commercialization at the Ira A. Fulton Schools of Engineering, presented his work on utilizing radar and microphone sensing to distinguish between human and AI-generated speech.
"We do a lot of work at the intersection of AI and the human voice, and so one of those areas of research is focused on developing technologies to mitigate risks from generative AI from deep fakes," Berisha said.
Berisha's group developed a system that uses hardware to detect the biological signatures required for speech and software to correlate those signatures with speaking patterns recorded. The hardware developed uses radar sensing to detect characteristics of the heart, lungs and vocal fold vibrations to verify a human is present when the speech is being produced.
"We've developed this new sort of security device, something that might clip onto the top of a computer monitor," Berisha said. "Every time you speak, it actually verifies that inside your body, the bio signals required in order for you to produce that speech are present, and they align with the speech."
The device will allow for a constant signal of human presence and has potential applications in headsets, mobile phones and laptops, Berisha said. He added that if the devices can be integrated in such a manner, then they can create an impact at scale.
The new method of verifying human presence is different from prior methods of verification, where AI systems could be trained to detect AI-generated speech. By verifying human presence at the biological level and time of speech, researchers can create a more robust method of verification.
Isabella Lenz, a postdoctoral researcher working with Berisha, said the potential of bad actors using AI to clone voices poses numerous risks, including threats to corporate security and financial information.
"One of the biggest things to be scared of is fraud," she said. "There's been a number of cases that we've heard about recently where people have cloned the voice or voice and image of people, like public figures."
The verification of human presence is not only important for deterring fraud by bad actors, but also has applications in art and media more broadly. Lenz said when someone clones a singer's voice with AI, the result may sound nice, but it strips away any humanity from the art itself.
The research resonated with audience members who had encountered deepfakes while using social media. Cole Patel, a freshman studying mechanical engineering, said he has seen Sora from ChatGPT used for content creation on TikTok.
However, Patel said the more verification of human-made content, the better. He added that the approach of proving the humanity of the content rather than the lack of AI present was a different approach than what he had previously seen.
"I was really blown away by the presentation," Patel said. "I thought it was super interesting and super useful if this could be applied."
Edited by Kate Gore, Senna James and Pippa Fung.
Reach the reporter at jdtamay1@asu.edu follow @JTamayo46036 on X.
Like The State Press on Facebook and follow @statepress on X.
John Tamayo is a science and technology reporter in his first semester with The State Press. He is a senior majoring in Physics and Philosophy.


