FineWeb-LMs: Training ELECTRA Augmented with Multi-word Selection (TEAMS)

BERT with TensorFlow Model Garden

This repository presents a TEAMS model that was pretrained on the 10BT subsets of FineWeb and FineWeb-Edu.

Pretraining Details

The released TEAMS model is part of my TensorFlow Model Garden LMs project.

The pretraining was done on a v3-32 TPU VM Pod, provided by the amazing TRC program. Detailed cheatsheets are available:

tl;dr: The model was pretrained for 1M steps with a global batch size of 256, a sequence length of 512 using a vocab size of 64k.

Checkpoint Evaluation with ScandEval

We evaluate the last 5 checkpoints (1M, 951k, 901k, 851k and 851k) with a recent version of ScandEval to check their performance and also compare it with popular encoder-only models such as BERT, RoBERTa or ELECTRA:

Model ID	Avg. Score	CoNLL-En	SST5	ScaLA-En	SQuAD
model-garden-lms/teams-base-finewebs-1m	72.64	89.27 ± 0.41 / 88.82 ± 0.41	59.58 ± 0.64 / 62.63 ± 3.0	66.72 ± 0.94 / 83.01 ± 0.45	59.95 ± 0.71 / 71.13 ± 0.58
model-garden-lms/teams-base-finewebs-951k	72.06	89.64 ± 0.52 / 89.18 ± 0.42	60.31 ± 1.03 / 58.82 ± 2.79	65.85 ± 2.01 / 82.47 ± 1.23	59.36 ± 0.77 / 70.82 ± 0.62
model-garden-lms/teams-base-finewebs-901k	72.19	89.31 ± 0.52 / 88.71 ± 0.53	59.86 ± 1.05 / 62.17 ± 2.61	64.89 ± 2.86 / 81.84 ± 1.65	59.74 ± 0.55 / 71.0 ± 0.5
model-garden-lms/teams-base-finewebs-851k	71.41	89.48 ± 0.47 / 88.99 ± 0.52	59.17 ± 1.2 / 60.25 ± 3.25	63.01 ± 2.31 / 80.77 ± 1.38	59.13 ± 0.53 / 70.5 ± 0.49
model-garden-lms/teams-base-finewebs-801k	70.73	89.2 ± 0.43 / 88.8 ± 0.46	59.21 ± 1.5 / 61.41 ± 2.36	58.47 ± 4.1 / 78.24 ± 2.4	59.59 ± 0.66 / 70.9 ± 0.59
google-bert/bert-base-cased	62.26	87.39 ± 0.79 / 87.11 ± 0.66	54.49 ± 1.36 / 53.22 ± 1.15	52.08 ± 2.13 / 74.52 ± 1.31	38.63 ± 2.1 / 50.68 ± 1.87
google/electra-base-discriminator	69.26	87.82 ± 0.69 / 86.83 ± 0.62	62.3 ± 1.12 / 55.93 ± 0.67	62.61 ± 1.21 / 80.85 ± 0.59	52.51 ± 0.86 / 65.2 ± 0.85
FacebookAI/roberta-base	68.96	90.35 ± 0.23 / 90.14 ± 0.2	60.95 ± 1.4 / 57.52 ± 1.97	50.64 ± 1.69 / 74.55 ± 0.9	57.82 ± 1.35 / 69.68 ± 1.02

Our pretrained TEAMS model shows strong performance across all tasks. All detailed results can be found in this dataset repository.

❤️ Acknowledgements

This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.

Made from Bavarian Oberland with ❤️ and 🥨.

model-garden-lms
/

teams-base-finewebs-851k

FineWeb-LMs: Training ELECTRA Augmented with Multi-word Selection (TEAMS)

Pretraining Details

Checkpoint Evaluation with ScandEval

❤️ Acknowledgements

Datasets used to train model-garden-lms/teams-base-finewebs-851k