GitHub - SesameAILabs/csm: A Conversational Speech Generation Model
CSM
2025/03/13 - We are releasing the 1B CSM variant. The checkpoint is hosted on Hugging Face.
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.
A fine-tuned variant of CSM powers the interactive voice demo shown in our blog post.
A hosted Hugging Face space is also available for testing audio generation.
...
Read more at github.com