metadata

license: mit
language:
  - fr
library_name: transformers
tags:
  - linformer
  - legal
  - medical
  - RoBERTa
  - pytorch

Jargon-general-base

Jargon is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.

Jargon is available in several versions with different context sizes and types of pre-training corpora.

Using Jargon models with HuggingFace transformers

You can get started with jargon-general-base using the code snippet below:

from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-base", trust_remote_code=True)

jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
output = jargon_maskfiller("Il est allé au <mask> hier")

Funded by
- GENCI-IDRIS (Grant 2022 A0131013801)
- French National Research Agency: Pantagruel grant ANR-23-IAS1-0001
- MIAI@Grenoble Alpes ANR-19-P3IA-0003
- PROPICTO ANR-20-CE93-0005
- Lawbot ANR-20-CE38-0013
- Swiss National Science Foundation (grant PROPICTO N°197864)
Language(s): French
License: MIT
Developed by: Vincent Segonne