SesameAILabs Releases CSM-1B: Open-Source Conversational Speech Model Generates Audio from Text and Voice Inputs

GitHub - SesameAILabs/csm: A Conversational Speech Generation Model

CSM 2025/03/13 - We are releasing the 1B CSM variant. The checkpoint is hosted on Hugging Face. CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes. A fine-tuned variant of CSM powers the interactive voice demo shown in our blog post. A hosted Hugging Face space is also available for testing audio generation. ...