pszemraj
/

griffin-llama3t-8L-v0.02-fineweb

Text Generation

recurrent_gemma

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Apr 28

Commit

f5e092e

•

1 Parent(s): 8c2acde

Update README.md

Files changed (1) hide show

README.md +10 -17

README.md CHANGED Viewed

@@ -8,31 +8,24 @@ metrics:
 model-index:
 - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
-This model is a fine-tuned version of [pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu](https://huggingface.co/pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
 It achieves the following results on the evaluation set:
 - Loss: 5.6538
 - Accuracy: 0.1881
 - Num Input Tokens Seen: 766509056
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
@@ -74,4 +67,4 @@ The following hyperparameters were used during training:
 - Transformers 4.40.1
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.0
-- Tokenizers 0.19.1

 model-index:
 - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
   results: []
+datasets:
+- BEE-spoke-data/fineweb-1M_en-med
+language:
+- en
 ---
+# griffin-llama3t-8L-v0.02-fineweb
+Pretraining experiment with griffin/recurrent_gemma arch. This one uses the Llama-3 tokenizer.
+## Model description
+Further training of [pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu](https://huggingface.co/pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
 It achieves the following results on the evaluation set:
 - Loss: 5.6538
 - Accuracy: 0.1881
 - Num Input Tokens Seen: 766509056
 ## Training procedure
 ### Training hyperparameters
 - Transformers 4.40.1
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.0
+- Tokenizers 0.19.1