Finally, the code for the web UI client used in the Moshi demo is provided in the client/ directory. If you want to fine tune Moshi, head out to kyutai-labs/moshi ...
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz ...