--- license: mit language: - fr library_name: transformers tags: - linformer - legal - medical - RoBERTa - pytorch --- # Jargon-general-base [Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture. Jargon is available in several versions with different context sizes and types of pre-training corpora. ## Using Jargon models with HuggingFace transformers You can get started with `jargon-general-base` using the code snippet below: ```python from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True) model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True) jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer) output = jargon_maskfiller("Il est allé au hier") ``` - **Funded by** - GENCI-IDRIS (Grant 2022 A0131013801) - French National Research Agency: Pantagruel grant ANR-23-IAS1-0001 - MIAI@Grenoble Alpes ANR-19-P3IA-0003 - PROPICTO ANR-20-CE93-0005 - Lawbot ANR-20-CE38-0013 - Swiss National Science Foundation (grant PROPICTO N°197864) - **Language(s):** French - **License:** MIT - **Developed by:** Vincent Segonne