guipenedo HF staff commited on
Commit
8f9955c
·
verified ·
1 Parent(s): 000156d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -14,9 +14,9 @@ datasets:
14
  This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
15
  The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
16
 
17
- - Paper: 🍷 FineWeb: decanting the web for the finest text data at scale https://hf.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
18
- - License: Apache-2
19
- - Languages: English
20
 
21
  ## Use
22
 
@@ -59,14 +59,14 @@ print([b.name for b in out.branches])
59
 
60
  ## Training
61
  ### Model
62
- - Architecture: Llama model
63
- - Pretraining steps: 167k
64
- - Pretraining tokens: 350B
65
- - Precision: bfloat16
66
 
67
  ### Hardware
68
- - GPUs: 64 H100
69
- - Training time: 72 GPU hours
70
 
71
  ### Software
72
  - [nanotron](https://github.com/huggingface/nanotron/) for training
 
14
  This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
15
  The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
16
 
17
+ - **Paper**: 🍷 FineWeb: decanting the web for the finest text data at scale https://hf.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
18
+ - **License**: Apache-2
19
+ - **Languages**: English
20
 
21
  ## Use
22
 
 
59
 
60
  ## Training
61
  ### Model
62
+ - **Architecture**: Llama model
63
+ - **Pretraining steps**: 167k
64
+ - **Pretraining tokens**: 350B
65
+ - **Precision**: bfloat16
66
 
67
  ### Hardware
68
+ - **GPUs**: 64 H100
69
+ - **Training time**: 72 wall clock hours
70
 
71
  ### Software
72
  - [nanotron](https://github.com/huggingface/nanotron/) for training