KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
Upload a video and audio to create a synchronized video with the same visuals but synchronized to the new audio.
How it works
- The system extracts embeddings and landmarks from the input video
- Audio embeddings are computed from the input audio
- A keyframe model generates key visual frames
- An interpolation model creates a smooth video between keyframes
- The final video is rendered with the new audio
Limitations
Due to GPU restrictions on Hugging Face Spaces, the demo is limited to processing videos of maximum 6 seconds in length. For longer videos or better performance, we recommend using the inference scripts provided in this repository (https://github.com/antonibigata/keysync) to run KeySync locally on your own hardware.