AudioX: Diffusion Transformer for Anything-to-Audio Generation
1HKUST
†Corresponding authors
Abstract
Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anything-to-Audio and Music Generation. Unlike previous domain-sp...
Read more at zeyuet.github.io