This folder consists of the Model training (Text Generation) which uses Transformer architecture as a backbone. I have Trained a 330 Million Parameter from sctrach from random weight initalization instead of the taking any existing trained weights . for the dataset I have collected from Fineweb , Fineweb-edu and some other high quality dataset (Research paper, wikipedia dataset) , also conduct the evalution for the model on specific steps
below I have shared the evalution detail
Evaluation Results (470K Steps)
| Step | Benchmark | Score |
|---|---|---|
| 470K | HellaSwag | 0.4057 |
| 470K | PIQA | 0.6692 |
| 470K | OpenBookQA | 0.3300 |
| 470K | WinoGrande | 0.6717 |
| 470K | Social IQa | 0.1950 |
| 470K | CommonsenseQA | 0.2023 |
| 470K | ARC Easy | 0.5211 |
| 470K | ARC Challenge | 0.2876 |
Also , I have shared some text dataset which in generated by the model (inference) in the below screenshot . I know that this model is not up to mark on the range of existing open source model available in the industry . this model is partial pretrained only and if any those who want to retrain / resume the training from them it's completely open. Thank you ;)