ByteDance's OmniHuman AI Generates Realistic Human Videos from Single Image and Audio/Video Inputs

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Gaojie Lin*, Jianwen Jiang*†, Jiaqi Yang*, Zerong Zheng*, Chao Liang Bytedance *Equal contribution,†Project lead TL;DR: We propose an end-to-end multimodality-conditioned human video generation framework named OmniHuman, which can generate human videos based on a single human image and motion signals (e.g., audio only, video only, or a combination of audio and video). In OmniHuman, we introduce a multimodality motion conditioning mixed training strategy, allowing the model to benefit from data s...