KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Upload a video and audio to create a synchronized video with the same visuals but synchronized to the new audio.

How it works

  1. The system extracts embeddings and landmarks from the input video
  2. Audio embeddings are computed from the input audio
  3. A keyframe model generates key visual frames
  4. An interpolation model creates a smooth video between keyframes
  5. The final video is rendered with the new audio

Limitations

Due to GPU restrictions on Hugging Face Spaces, the demo is limited to processing videos of maximum 6 seconds in length. For longer videos or better performance, we recommend using the inference scripts provided in this repository (https://github.com/antonibigata/keysync) to run KeySync locally on your own hardware.