Tensoic
/

TinyLlama-1.1B-2.5T-openhermes

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adarshxs commited on Dec 29, 2023

Commit

59d3add

•

1 Parent(s): 20c3781

Update README.md

Files changed (1) hide show

README.md +5 -21

README.md CHANGED Viewed

@@ -5,11 +5,11 @@ model-index:
 - name: out
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 <details><summary>See axolotl config</summary>
 axolotl version: `0.3.0`
@@ -87,25 +87,9 @@ special_tokens:
 </details><br>
-# out
-This model was trained from scratch on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.3425
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters

 - name: out
   results: []
 ---
+This is the Instruction Fine Tuned version of [Tiny Llama](https://github.com/jzhang38/TinyLlama) on [@Teknium1's](https://twitter.com/Teknium1) [openhermes](https://huggingface.co/datasets/teknium/openhermes) dataset.
+`"The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01."`
 <details><summary>See axolotl config</summary>
 axolotl version: `0.3.0`
 </details><br>
+The loss for the 3T checkpoint explodes for some reason
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/06bfkeS7cPoHxkeIHe5M7.jpeg)
 ### Training hyperparameters