## Rigel Pretrained Model Base and Fine tuned models ### Dataset * **Size:** Total 1921 hours of speech and vocals. * **Languages:** * Arabic: ~70 hours * Chinese (Mandarin): ~70 hours * English: ~800 hours * French: ~42 hours * German: ~35 hours * Hindi: ~30 hours * Indonesian: ~53 hours * Japanese: ~140 hours * Korean: ~80 hours * Portuguese: ~40 hours * Russian: ~188 hours * Singing (all languages): ~190 hours * Spanish: ~200 hours * Tagalog: ~30 hours * Common language: Unknown amount ### Sampling Frequency * **32kHz** (Done) * **40kHz** (Retraining) ### Models #### **Base Model** * **Data:** Total 1921 hours of low-mid quality data. * **Steps:** 3,890,220 * **Batch:** 40 * **Precision:** FP32 * **Sampling Rate:** 32k #### **Fine-Tuned Model** * **Data:** 102 hours of high-quality data. * **Steps:** 2,854,856 * **Batch:** 20 * **Precision:** FP32 * **Sampling Rate:** 32k ### Hardware Used * **CPU:** AMD EPYC 9754 * **RAM:** 256GB * **GPUs:** * 1 x H100 * 4 x L40s * 1 x RTX 4080 * 1 x RTX 4070 Ti ### Expected Release Date * July 22nd ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)