Update README.md
Browse files
README.md
CHANGED
@@ -14,9 +14,9 @@ datasets:
|
|
14 |
This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
15 |
The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
|
16 |
|
17 |
-
- Paper
|
18 |
-
- License
|
19 |
-
- Languages
|
20 |
|
21 |
## Use
|
22 |
|
@@ -59,14 +59,14 @@ print([b.name for b in out.branches])
|
|
59 |
|
60 |
## Training
|
61 |
### Model
|
62 |
-
- Architecture
|
63 |
-
- Pretraining steps
|
64 |
-
- Pretraining tokens
|
65 |
-
- Precision
|
66 |
|
67 |
### Hardware
|
68 |
-
- GPUs
|
69 |
-
- Training time
|
70 |
|
71 |
### Software
|
72 |
- [nanotron](https://github.com/huggingface/nanotron/) for training
|
|
|
14 |
This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
15 |
The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
|
16 |
|
17 |
+
- **Paper**: 🍷 FineWeb: decanting the web for the finest text data at scale https://hf.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
|
18 |
+
- **License**: Apache-2
|
19 |
+
- **Languages**: English
|
20 |
|
21 |
## Use
|
22 |
|
|
|
59 |
|
60 |
## Training
|
61 |
### Model
|
62 |
+
- **Architecture**: Llama model
|
63 |
+
- **Pretraining steps**: 167k
|
64 |
+
- **Pretraining tokens**: 350B
|
65 |
+
- **Precision**: bfloat16
|
66 |
|
67 |
### Hardware
|
68 |
+
- **GPUs**: 64 H100
|
69 |
+
- **Training time**: 72 wall clock hours
|
70 |
|
71 |
### Software
|
72 |
- [nanotron](https://github.com/huggingface/nanotron/) for training
|