PantagrueLLM
/

jargon-general-legal

@@ -10,7 +10,7 @@ tags:
 - pytorch
 ---
-# Jargon-legal-4096
 [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
@@ -25,9 +25,9 @@ Jargon is available in several versions with different context sizes and types o
 |-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
 | [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base)        |         scratch         |8.5GB Web Corpus|
 | [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed)    |   jargon-general-base   |5.4GB Medical Corpus|
-| [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal)                                                                |   jargon-general-base   |18GB Legal Corpus
 | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) |   jargon-general-base   |Medical+Legal Corpora|
-| [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal)                                                                        |         scratch         |18GB Legal Corpus|
 | [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096)            |         scratch         |18GB Legal Corpus|
 | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed)                    |         scratch         |5.4GB Medical Corpus|
 | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096)          |         scratch         |5.4GB Medical Corpus|
@@ -58,13 +58,13 @@ For more info please check out the [paper](https://hal.science/hal-04535557/file
 ## Using Jargon models with HuggingFace transformers
-You can get started with `jargon-legal-4096` using the code snippet below:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
-tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-legal-4096", trust_remote_code=True)
-model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-legal-4096", trust_remote_code=True)
 jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 output = jargon_maskfiller("Il est allé au <mask> hier")

 - pytorch
 ---
+# Jargon-general-legal
 [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
 |-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
 | [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base)        |         scratch         |8.5GB Web Corpus|
 | [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed)    |   jargon-general-base   |5.4GB Medical Corpus|
+| [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal) (this model) | jargon-general-base   |18GB Legal Corpus
 | [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) |   jargon-general-base   |Medical+Legal Corpora|
+| [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal)                      |         scratch         |18GB Legal Corpus|
 | [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096)            |         scratch         |18GB Legal Corpus|
 | [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed)                    |         scratch         |5.4GB Medical Corpus|
 | [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096)          |         scratch         |5.4GB Medical Corpus|
 ## Using Jargon models with HuggingFace transformers
+You can get started with this model using the code snippet below:
 ```python
 from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
+tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
+model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
 jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 output = jargon_maskfiller("Il est allé au <mask> hier")