--- library_name: transformers license: apache-2.0 base_model: answerdotai/ModernBERT-base tags: - ModernBERT - fineweb - filtering - regression metrics: - precision - recall - accuracy model-index: - name: 8e-5_one_label results: [] datasets: - HuggingFaceFW/fineweb-edu-llama3-annotations language: - en --- One-off run using a [modified version](https://gist.github.com/bclavie/93d3b161d7fb41131bca41a50b6726c5) of the original Fineweb-Edu quality filter regression training code, simply replacing the original model (snowflake-embed-m, a model fine-tuned on BERT-base) with ModernBERT-base. w/o extensive tuning, the model trains considerably faster than BERT-base, and gets **+5 Weighted F1**: # Results ## ModernBERT-base-fineweb-edu-example **Weighted F1: 0.76** **Detailed:** ``` Validation Report: precision recall f1-score support 0 0.80 0.55 0.65 5694 1 0.82 0.86 0.84 26512 2 0.64 0.71 0.67 10322 3 0.65 0.60 0.63 3407 4 0.80 0.37 0.51 807 5 0.00 0.00 0.00 1 accuracy 0.76 46743 macro avg 0.62 0.51 0.55 46743 weighted avg 0.76 0.76 0.76 46743 ``` ## Original Classifier (https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier): **Weighted F1: 0.71** **Detailed:** ``` precision recall f1-score support 0 0.75 0.49 0.59 5694 1 0.78 0.84 0.81 26512 2 0.57 0.61 0.59 10322 3 0.56 0.50 0.53 3407 4 0.58 0.35 0.44 807 5 0.33 0.01 0.02 125 accuracy 0.71 46867 macro avg 0.60 0.47 0.50 46867 weighted avg 0.71 0.71 0.71 46867 ``` (for some reason, the currently available annotated dataset is identical, except that it's missing 124 of the 125 5-rated examples. These are so anecdotal they have no real impact on the weighted metrics.) # Params Most parameters detailed in the script. Key hparams: - **Learning Rate**: 5e-5 - **Weight Decay**: 0.1 (decoupled) - **Seed**: 1 - **Warmup**: 10% steps - **Schedule**: Linear decay - **Max epochs**: 10 - **Best Epoch**: #3 - **Precision**: bfloat16