pszemraj commited on
Commit
f5e092e
1 Parent(s): 8c2acde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -17
README.md CHANGED
@@ -8,31 +8,24 @@ metrics:
8
  model-index:
9
  - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
10
  results: []
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
17
 
18
- This model is a fine-tuned version of [pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu](https://huggingface.co/pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
 
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 5.6538
21
  - Accuracy: 0.1881
22
  - Num Input Tokens Seen: 766509056
23
 
24
- ## Model description
25
-
26
- More information needed
27
-
28
- ## Intended uses & limitations
29
-
30
- More information needed
31
-
32
- ## Training and evaluation data
33
-
34
- More information needed
35
-
36
  ## Training procedure
37
 
38
  ### Training hyperparameters
@@ -74,4 +67,4 @@ The following hyperparameters were used during training:
74
  - Transformers 4.40.1
75
  - Pytorch 2.3.0+cu121
76
  - Datasets 2.19.0
77
- - Tokenizers 0.19.1
 
8
  model-index:
9
  - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
10
  results: []
11
+ datasets:
12
+ - BEE-spoke-data/fineweb-1M_en-med
13
+ language:
14
+ - en
15
  ---
16
 
17
+ # griffin-llama3t-8L-v0.02-fineweb
 
18
 
19
+ Pretraining experiment with griffin/recurrent_gemma arch. This one uses the Llama-3 tokenizer.
20
 
21
+ ## Model description
22
+
23
+ Further training of [pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu](https://huggingface.co/pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
24
  It achieves the following results on the evaluation set:
25
  - Loss: 5.6538
26
  - Accuracy: 0.1881
27
  - Num Input Tokens Seen: 766509056
28
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Training procedure
30
 
31
  ### Training hyperparameters
 
67
  - Transformers 4.40.1
68
  - Pytorch 2.3.0+cu121
69
  - Datasets 2.19.0
70
+ - Tokenizers 0.19.1