## Rigel Pretrained Model
Base and Fine tuned models

### Dataset

* **Size:** Total 1921 hours of speech and vocals.
* **Languages:**
    * Arabic: ~70 hours
    * Chinese (Mandarin): ~70 hours
    * English: ~800 hours
    * French: ~42 hours
    * German: ~35 hours
    * Hindi: ~30 hours
    * Indonesian: ~53 hours
    * Japanese: ~140 hours
    * Korean: ~80 hours
    * Portuguese: ~40 hours
    * Russian: ~188 hours
    * Singing (all languages): ~190 hours
    * Spanish: ~200 hours
    * Tagalog: ~30 hours
    * Common language: Unknown amount

### Sampling Frequency

* **32kHz** (Done)
* **40kHz** (Retraining)

### Models

#### **Base Model**

* **Data:** Total 1921 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40
* **Precision:** FP32
* **Sampling Rate:** 32k

#### **Fine-Tuned Model**

* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20
* **Precision:** FP32
* **Sampling Rate:** 32k

### Hardware Used

* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
    * 1 x H100
    * 4 x L40s
    * 1 x RTX 4080
    * 1 x RTX 4070 Ti

### Expected Release Date

* July 22nd

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)