Nari Labs Unveils Dia: 1.6B-Parameter TTS Model Generates Ultra-Realistic Dialogue, Available on GitHub and Hugging Face

GitHub - nari-labs/dia: A TTS model capable of generating ultra-realistic dialogue in one pass.

Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. To accelerate research, we are providing access to pretrained model checkpoints and inference code. The model weights are hosted on Hugging Face. We also provide a demo page comparing our mode...