KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Upload a video and audio to create a synchronized video with the same visuals but synchronized to the new audio.

Input Video

Input Audio

Output Video

How it works

The system extracts embeddings and landmarks from the input video
Audio embeddings are computed from the input audio
A keyframe model generates key visual frames
An interpolation model creates a smooth video between keyframes
The final video is rendered with the new audio

Limitations

Due to GPU restrictions on Hugging Face Spaces, the demo is limited to processing videos of maximum 6 seconds in length. For longer videos or better performance, we recommend using the inference scripts provided in this repository (https://github.com/antonibigata/keysync) to run KeySync locally on your own hardware.