YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Higgs Audio V2 LLM Model

This is the Higgs Audio V2 3B parameter text-to-speech generation model.

Model Description

The Higgs Audio V2 LLM is an end-to-end multimodal model capable of understanding and generating high-quality text-to-speech audio.

Model Details

  • Model Type: Text-to-Speech Generation Model
  • Parameters: 3B
  • Framework: PyTorch
  • Architecture: Transformer-based multimodal architecture

Usage

from higgs_audio.serve.serve_engine import HiggsAudioServeEngine

engine = HiggsAudioServeEngine(
    model_name_or_path="Toowired/higgs-v2-llm",
    audio_tokenizer_name_or_path="Toowired/higgs-v2-tokenizer",
    device="cuda"
)

Model Architecture

  • Text encoder for understanding input text
  • Audio decoder for generating speech
  • Multimodal fusion layers
  • 3B parameters for high-quality generation

Intended Use

  • Text-to-speech synthesis
  • Voice generation applications
  • Audio content creation
  • Research in speech synthesis

Limitations

  • Requires significant computational resources
  • Optimized for English text
  • May require fine-tuning for specific voices

License

Please check the original model repository for license information.

Contact

For questions about this model, please contact the repository owner.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support