Google DeepMind creates AI model that can add sound to silent videos (2024)

Google DeepMind creates AI model that can add sound to silent videos (1)

Fresh off of animating memes over the last few days, AI has turned its attention to silent videos. Specifically, bringing audio to AI-generated clips.

Google’s DeepMind research arm has built a powerful new AI model that can add audio to videos without sound, dubbing over the top with sound effects and music.

What is most impressive about the new research is the ability to accurately follow the visuals. In one clips they show a close up of guitar playing and the music in the SFX closely matches the actual notes being played.

In some ways, it’s the other side of the coin that saw the generation of music based on a visual prompt last month via ElevenLabs and brings with it plenty of potential for restoration of old media that no longer has an audio component — and Charlie Chaplin may be about to get a new voice if this progresses further.

While the Google DeepMind model isn't available to use yet, there is a similar tool from ElevenLabs that you can try today. If you want to create a video to try it you can check out our 5 best AI video generators list.

Google's new audio generation is off to a solid start

In the thread of posts on X, Google’s DeepMind account starts things off with a character walking through an eerily lit tunnel.

Some light choir music can be heard over the top of dramatic percussion as the character’s footsteps can be heard as they move through the scene.

Sign up to get the BEST of Tom’s Guide direct to your inbox.

Upgrade your life with a daily dose of the biggest tech news, lifestyle hacks and our curated analysis. Be the first to know about cutting-edge gadgets and the hottest deals.

The second, audio generated with “Wolf howling at the moon” as the prompt, ties in nicely with the animation, and even offers a chorus of howls in the distance.

We're sharing progress on our video-to-audio (V2A) generative technology. 🎥It can add sound to silent clips that match the acoustics of the scene, accompany on-screen action, and more.Here are 4 examples - turn your sound on. 🧵🔊 https://t.co/VHpJ2cBr24 pic.twitter.com/S5m159Ye62June 17, 2024

The harmonica example sounds a little too “uncanny valley” in the way its pitch shifts, but the backing underneath is solid, while the jellyfish one sounds like, well, jellyfish. Notably, that has some extra prompts, though, including “marine life” and “ocean”.

The video with the prompt “A drummer on a stage at a concert surrounded by flashing lights and a cheering crowd” is a little off, though. For one, the beats don’t quite match the rhythm in the video once it gets going, while the sticks appear to be focused on the snare and maybe a floor tom, while the audio sounds a tad more complex with some other drums involved.

Still, it’s an impressive start to a project that’s only likely to grow with time.

Limitations of the DeepMind model

Like many projects from Google this hasn't been released yet, its just a research preview. Google says there are limitations and safety issues to address first.

For example: "Since the quality of the audio output is dependent on the quality of the video input, artefacts or distortions in the video, which are outside the model’s training distribution, can lead to a noticeable drop in audio quality."

They are also working on lip synching for videos with speech as, while it currently attempts to do this it isn't always accurate and creates an uncanny valley effect.

ElevenLabs is working on a similar project

We are excited to introduce the Text to Sound Effects API. To showcase it - we've built the first Video to Sounds Effects app. This app is available for free online and fully open-source. pic.twitter.com/8aalo8GCSoJune 17, 2024

Not to be outdone, ElevenLabs this week revealed its new Text to Sound Effects API that can generate audio effects based on what you upload to it.

Unlike Google's V2A model, the API from ElevenLabs is already accessible and from experiments works surprisingly well.

In the example above, a video of a bottle smashing gets a few different options to choose from, while the DiCaprio laughing meme gets a additional audio from other people in the room.

The company 'bootstrapped' a quick app to demonstrate what is possible with the API, allowing you to upload a video and have it add the sound. This is free to use and open source, and you can try it right now.

ElevenLabs told Tom's Guide the real aim is to have other companies and developers build things with the API themselves, such as integrating into generative video.

More from Tom's Guide

  • 5 Best AI video generators — tested and compared
  • ChatGPT-4o vs Gemini Pro 1.5 — 7 prompts to find the best AI chatbot
  • ChatGPT was down due to major outage — here's what happened

Lloyd Coombes

More about ai image and video

How to create AI-generated videos with Luma Dream MachineHow to make AI videos with Pika Labs

Latest

Your nursery could get hacked - here's how to prevent it
See more latest►

No comments yetComment from the forums

    Most Popular
    iOS 18 Game Mode: What is it and how does it work?
    Today's NYT Connections hints and answers — Thursday, June 20, #375
    How to watch 'Taylor Swift vs Scooter Braun: Bad Blood' online — stream from anywhere
    Google Maps may about to drop a feature that connected you to places and businesses
    'Yellowstone' returns for its final season this November — what you need to know
    Australia vs Bangladesh live stream: How to watch T20 World Cup 2024 online
    Argentina vs Canada live stream: How to watch Copa America 2024
    Samsung Galaxy S25 Ultra camera upgrades just revealed in new leak
    Copa America 2024: How to live stream every game
    Latest Android 15 beta has a new feature that can save you serious battery life
    Netherlands vs France live stream: How to watch Euro 2024 online and for free today
    Google DeepMind creates AI model that can add sound to silent videos (2024)
    Top Articles
    Latest Posts
    Article information

    Author: Gregorio Kreiger

    Last Updated:

    Views: 6401

    Rating: 4.7 / 5 (57 voted)

    Reviews: 88% of readers found this page helpful

    Author information

    Name: Gregorio Kreiger

    Birthday: 1994-12-18

    Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

    Phone: +9014805370218

    Job: Customer Designer

    Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

    Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.