Google DeepMind creates AI model that can add sound to silent videos (2024)

Google DeepMind creates AI model that can add sound to silent videos (1)

Fresh off of animating memes over the last few days, AI has turned its attention to silent videos. Specifically, bringing audio to AI-generated clips.

Google’s DeepMind research arm has built a powerful new AI model that can add audio to videos without sound, dubbing over the top with sound effects and music.

What is most impressive about the new research is the ability to accurately follow the visuals. In one clips they show a close up of guitar playing and the music in the SFX closely matches the actual notes being played.

In some ways, it’s the other side of the coin that saw the generation of music based on a visual prompt last month via ElevenLabs and brings with it plenty of potential for restoration of old media that no longer has an audio component — and Charlie Chaplin may be about to get a new voice if this progresses further.

While the Google DeepMind model isn't available to use yet, there is a similar tool from ElevenLabs that you can try today. If you want to create a video to try it you can check out our 5 best AI video generators list.

Google's new audio generation is off to a solid start

In the thread of posts on X, Google’s DeepMind account starts things off with a character walking through an eerily lit tunnel.

Some light choir music can be heard over the top of dramatic percussion as the character’s footsteps can be heard as they move through the scene.

Sign up to get the BEST of Tom’s Guide direct to your inbox.

Upgrade your life with a daily dose of the biggest tech news, lifestyle hacks and our curated analysis. Be the first to know about cutting-edge gadgets and the hottest deals.

Limitations of the DeepMind model

Like many projects from Google this hasn't been released yet, its just a research preview. Google says there are limitations and safety issues to address first.

For example: "Since the quality of the audio output is dependent on the quality of the video input, artefacts or distortions in the video, which are outside the model’s training distribution, can lead to a noticeable drop in audio quality."

They are also working on lip synching for videos with speech as, while it currently attempts to do this it isn't always accurate and creates an uncanny valley effect.

ElevenLabs is working on a similar project

We are excited to introduce the Text to Sound Effects API. To showcase it - we've built the first Video to Sounds Effects app. This app is available for free online and fully open-source. pic.twitter.com/8aalo8GCSoJune 17, 2024

Not to be outdone, ElevenLabs this week revealed its new Text to Sound Effects API that can generate audio effects based on what you upload to it.

Unlike Google's V2A model, the API from ElevenLabs is already accessible and from experiments works surprisingly well.

In the example above, a video of a bottle smashing gets a few different options to choose from, while the DiCaprio laughing meme gets a additional audio from other people in the room.

The company 'bootstrapped' a quick app to demonstrate what is possible with the API, allowing you to upload a video and have it add the sound. This is free to use and open source, and you can try it right now.

ElevenLabs told Tom's Guide the real aim is to have other companies and developers build things with the API themselves, such as integrating into generative video.

Google DeepMind creates AI model that can add sound to silent videos (2024)

Google's new audio generation is off to a solid start

Sign up to get the BEST of Tom’s Guide direct to your inbox.

Limitations of the DeepMind model

ElevenLabs is working on a similar project

More from Tom's Guide

Most Popular