YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Higgs Audio V2 LLM Model
This is the Higgs Audio V2 3B parameter text-to-speech generation model.
Model Description
The Higgs Audio V2 LLM is an end-to-end multimodal model capable of understanding and generating high-quality text-to-speech audio.
Model Details
- Model Type: Text-to-Speech Generation Model
- Parameters: 3B
- Framework: PyTorch
- Architecture: Transformer-based multimodal architecture
Usage
from higgs_audio.serve.serve_engine import HiggsAudioServeEngine
engine = HiggsAudioServeEngine(
model_name_or_path="Toowired/higgs-v2-llm",
audio_tokenizer_name_or_path="Toowired/higgs-v2-tokenizer",
device="cuda"
)
Model Architecture
- Text encoder for understanding input text
- Audio decoder for generating speech
- Multimodal fusion layers
- 3B parameters for high-quality generation
Intended Use
- Text-to-speech synthesis
- Voice generation applications
- Audio content creation
- Research in speech synthesis
Limitations
- Requires significant computational resources
- Optimized for English text
- May require fine-tuning for specific voices
License
Please check the original model repository for license information.
Contact
For questions about this model, please contact the repository owner.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support