MUSTAR's picture
Update README.md
fc3181a verified

Rigel Pretrained Model

Base and Fine tuned models

Dataset

  • Size: Total 1921 hours of speech and vocals.
  • Languages:
    • Arabic: ~70 hours
    • Chinese (Mandarin): ~70 hours
    • English: ~800 hours
    • French: ~42 hours
    • German: ~35 hours
    • Hindi: ~30 hours
    • Indonesian: ~53 hours
    • Japanese: ~140 hours
    • Korean: ~80 hours
    • Portuguese: ~40 hours
    • Russian: ~188 hours
    • Singing (all languages): ~190 hours
    • Spanish: ~200 hours
    • Tagalog: ~30 hours
    • Common language: Unknown amount

Sampling Frequency

  • 32kHz (Done)
  • 40kHz (Retraining)

Models

Base Model

  • Data: Total 1921 hours of low-mid quality data.
  • Steps: 3,890,220
  • Batch: 40
  • Precision: FP32
  • Sampling Rate: 32k

Fine-Tuned Model

  • Data: 102 hours of high-quality data.
  • Steps: 2,854,856
  • Batch: 20
  • Precision: FP32
  • Sampling Rate: 32k

Hardware Used

  • CPU: AMD EPYC 9754
  • RAM: 256GB
  • GPUs:
    • 1 x H100
    • 4 x L40s
    • 1 x RTX 4080
    • 1 x RTX 4070 Ti

Expected Release Date

  • July 22nd

image/png