Hibiki: New AI Model Revolutionizes Real-Time Speech Translation, Outperforms Existing Systems in French-English Task

High-Fidelity Simultaneous Speech-To-Speech Translation

View PDF HTML (experimental) Abstract:We introduce Hibiki, a decoder-only model for simultaneous speech translation. Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation. We furthermore address the fundamental challenge of simultaneous interpretation, which unlike its consecutive counterpart, where one waits for the end of the source utterance to start...