answerdotai
/

ModernBERT-large

Inference Endpoints

Model card Files Files and versions Community

bclavie commited on 2 days ago

Commit

8ae8825

•

1 Parent(s): c72b2bd

Update README.md

Files changed (1) hide show

README.md +13 -5

README.md CHANGED Viewed

@@ -39,9 +39,9 @@ It is available in the following sizes:
 ## Usage
-You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`.
-**⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model, and is a 1:1 match of our research implementation. To do so, install Flash Attention as follows, then use the model as normal:**
 ```bash
 pip install flash-attn
@@ -86,8 +86,6 @@ results = pipe(input_text)
 pprint(results)
 ```
-To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
 **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
 ## Evaluation
@@ -151,4 +149,14 @@ We release the ModernBERT model architectures, model weights, training codebase
 If you use ModernBERT in your work, please cite:
-**TODO: Citation**

 ## Usage
+You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
+**⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model. To do so, install Flash Attention as follows, then use the model as normal:**
 ```bash
 pip install flash-attn
 pprint(results)
 ```
 **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
 ## Evaluation
 If you use ModernBERT in your work, please cite:
+```
+@misc{modernbert,
+      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
+      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
+      year={2024},
+      eprint={2412.13663},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2412.13663},
+}
+```