An end-to-end (e2e) Voice Language Model by Fish Audio.
Chat with an AI that understands images and text
Generate speech from text