PantagrueLLM
/

jargon-general-biomed

@@ -11,7 +11,7 @@ tags:
 - pytorch
 ---
-# Jargon-general-base
 [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
@@ -24,20 +24,24 @@ Jargon is available in several versions with different context sizes and types o
 ## Using Jargon models with HuggingFace transformers
-You can get started with `jargon-general-base` using the code snippet below:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
-tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True)
-model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True)
 jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 output = jargon_maskfiller("Il est allé au <mask> hier")
 ```
 - **Funded by**
   - GENCI-IDRIS (Grant 2022 A0131013801)
   - French National Research Agency: Pantagruel grant ANR-23-IAS1-0001
@@ -50,8 +54,38 @@ output = jargon_maskfiller("Il est allé au <mask> hier")
 - **Language(s):** French
 - **License:** MIT
 - **Developed by:** Vincent Segonne
 <!-- - **Finetuned from model [optional]:** [More Information Needed] -->
 <!--
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->

 - pytorch
 ---
+# Jargon-general-biomed
 [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
 ## Using Jargon models with HuggingFace transformers
+You can get started with `jargon-general-biomed` using the code snippet below:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
+tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-biomed", trust_remote_code=True)
+model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-biomed", trust_remote_code=True)
 jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 output = jargon_maskfiller("Il est allé au <mask> hier")
 ```
+You can also use the classes `AutoModel`, `AutoModelForSequenceClassification`, or `AutoModelForTokenClassification` to load Jargon models, depending on the downstream task in question.
 - **Funded by**
   - GENCI-IDRIS (Grant 2022 A0131013801)
   - French National Research Agency: Pantagruel grant ANR-23-IAS1-0001
 - **Language(s):** French
 - **License:** MIT
 - **Developed by:** Vincent Segonne
+## Citation
+If you use this model for your own research work, please cite as follows:
+```bibtex
+@inproceedings{segonne:hal-04535557,
+  TITLE = {{Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains}},
+  AUTHOR = {Segonne, Vincent and Mannion, Aidan and Alonzo Canul, Laura Cristina and Audibert, Alexandre and Liu, Xingyu and Macaire, C{\'e}cile and Pupier, Adrien and Zhou, Yongxin and Aguiar, Mathilde and Herron, Felix and Norr{\'e}, Magali and Amini, Massih-Reza and Bouillon, Pierrette and Eshkol-Taravella, Iris and Esperan{\c c}a-Rodier, Emmanuelle and Fran{\c c}ois, Thomas and Goeuriot, Lorraine and Goulian, J{\'e}r{\^o}me and Lafourcade, Mathieu and Lecouteux, Benjamin and Portet, Fran{\c c}ois and Ringeval, Fabien and Vandeghinste, Vincent and Coavoux, Maximin and Dinarelli, Marco and Schwab, Didier},
+  URL = {https://hal.science/hal-04535557},
+  BOOKTITLE = {{LREC-COLING 2024 - Joint International Conference on Computational Linguistics, Language Resources and Evaluation}},
+  ADDRESS = {Turin, Italy},
+  YEAR = {2024},
+  MONTH = May,
+  KEYWORDS = {Self-supervised learning ; Pretrained language models ; Evaluation benchmark ; Biomedical document processing ; Legal document processing ; Speech transcription},
+  PDF = {https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf},
+  HAL_ID = {hal-04535557},
+  HAL_VERSION = {v1},
+}
+```
 <!-- - **Finetuned from model [optional]:** [More Information Needed] -->
 <!--
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->