Video-Guided Foley Sound Generation with Multimodal Controls
arXiv 2024
1 University of Michigan
2 Adobe Research
(* Work done during an internship at Adobe)
tl;dr: We introduce MultiFoley, a model designed for video-guided sound generation
that supports multimodal conditioning through text, audio, and video.
Click on each video to unmute or mute its generated audio and listen with headphones for best experience.
Text input: lion roaring, high quality.
Text input: skateboard, wheels spinning, high quality.
Text input: basketball dribble, high quality.
Tex...
Read more at ificl.github.io