Update README.md
Browse files
README.md
CHANGED
@@ -39,9 +39,9 @@ It is available in the following sizes:
|
|
39 |
|
40 |
## Usage
|
41 |
|
42 |
-
You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`.
|
43 |
|
44 |
-
**⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model
|
45 |
|
46 |
```bash
|
47 |
pip install flash-attn
|
@@ -86,8 +86,6 @@ results = pipe(input_text)
|
|
86 |
pprint(results)
|
87 |
```
|
88 |
|
89 |
-
To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
|
90 |
-
|
91 |
**Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
|
92 |
|
93 |
## Evaluation
|
@@ -151,4 +149,14 @@ We release the ModernBERT model architectures, model weights, training codebase
|
|
151 |
|
152 |
If you use ModernBERT in your work, please cite:
|
153 |
|
154 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## Usage
|
41 |
|
42 |
+
You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
|
43 |
|
44 |
+
**⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model. To do so, install Flash Attention as follows, then use the model as normal:**
|
45 |
|
46 |
```bash
|
47 |
pip install flash-attn
|
|
|
86 |
pprint(results)
|
87 |
```
|
88 |
|
|
|
|
|
89 |
**Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
|
90 |
|
91 |
## Evaluation
|
|
|
149 |
|
150 |
If you use ModernBERT in your work, please cite:
|
151 |
|
152 |
+
```
|
153 |
+
@misc{modernbert,
|
154 |
+
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
|
155 |
+
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
|
156 |
+
year={2024},
|
157 |
+
eprint={2412.13663},
|
158 |
+
archivePrefix={arXiv},
|
159 |
+
primaryClass={cs.CL},
|
160 |
+
url={https://arxiv.org/abs/2412.13663},
|
161 |
+
}
|
162 |
+
```
|