open-mmlab/FoleyCrafter

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝

/ 100

Emerging

Uses a modular pipeline combining semantic video understanding (via adapter networks), text-to-audio diffusion (Auffusion base model), and temporal synchronization through timestamp detection to align sound effects with visual events. Incorporates a dedicated vocoder for high-quality audio synthesis and supports both semantic-driven generation and frame-level temporal alignment modes for precise audio-visual synchronization.

644 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

644

Forks

Language

Python

License

Apache-2.0

Related models

kyegomez/Sora

Implementation of the premier Text to Video model from OpenAI

ShubhamZade1997/Pixelle-Video

🎥 Create stunning short videos effortlessly with Pixelle-Video, an AI-driven engine designed for...

CharlieDreemur/AI-Video-Converter

AI Video Converter Based on ControlNet

abrahamjroy/ProcessedElectricSheepDreams

A native Application , With Agentic Support (MCP) for ultra fast AI image generation using a...

donahowe/AutoStudio

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

Explore Diffusion Models

All categories Trending Diffusion directory Insights