HuggingFaceFW
/

ablation-model-fineweb-edu

Text Generation

text-generation-inference

Model card Files Files and versions Community

guipenedo HF Staff commited on Jun 5, 2024

Commit

8f9955c

·

verified ·

1 Parent(s): 000156d

Update README.md

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -14,9 +14,9 @@ datasets:
 This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
 The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
-- Paper: 🍷 FineWeb: decanting the web for the finest text data at scale https://hf.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
-- License: Apache-2
-- Languages: English
 ## Use
@@ -59,14 +59,14 @@ print([b.name for b in out.branches])
 ## Training
 ### Model
-- Architecture: Llama model
-- Pretraining steps: 167k
-- Pretraining tokens: 350B
-- Precision: bfloat16
 ### Hardware
-- GPUs: 64 H100
-- Training time: 72 GPU hours
 ### Software
 - [nanotron](https://github.com/huggingface/nanotron/) for training

 This model is part of the 🍷 [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) ablations, detailed in this [technical report](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
 The model has 1.82B parameters, 2048 context length and uses Llama architecture with RoPE. It was trained on 350B tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu), tokenized using `gpt2` tokenizer.
+- **Paper**: 🍷 FineWeb: decanting the web for the finest text data at scale https://hf.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
+- **License**: Apache-2
+- **Languages**: English
 ## Use
 ## Training
 ### Model
+- **Architecture**: Llama model
+- **Pretraining steps**: 167k
+- **Pretraining tokens**: 350B
+- **Precision**: bfloat16
 ### Hardware
+- **GPUs**: 64 H100
+- **Training time**: 72 wall clock hours
 ### Software
 - [nanotron](https://github.com/huggingface/nanotron/) for training