Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
- pytorch
|
11 |
---
|
12 |
|
13 |
-
# Jargon-legal
|
14 |
|
15 |
[Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
|
16 |
|
@@ -25,9 +25,9 @@ Jargon is available in several versions with different context sizes and types o
|
|
25 |
|-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
|
26 |
| [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base) | scratch |8.5GB Web Corpus|
|
27 |
| [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed) | jargon-general-base |5.4GB Medical Corpus|
|
28 |
-
| [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal)
|
29 |
| [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
|
30 |
-
| [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal)
|
31 |
| [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096) | scratch |18GB Legal Corpus|
|
32 |
| [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
|
33 |
| [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
|
@@ -58,13 +58,13 @@ For more info please check out the [paper](https://hal.science/hal-04535557/file
|
|
58 |
|
59 |
## Using Jargon models with HuggingFace transformers
|
60 |
|
61 |
-
You can get started with
|
62 |
|
63 |
```python
|
64 |
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
|
65 |
|
66 |
-
tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-legal
|
67 |
-
model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-legal
|
68 |
|
69 |
jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
|
70 |
output = jargon_maskfiller("Il est allé au <mask> hier")
|
|
|
10 |
- pytorch
|
11 |
---
|
12 |
|
13 |
+
# Jargon-general-legal
|
14 |
|
15 |
[Jargon](https://hal.science/hal-04535557/file/FB2_domaines_specialises_LREC_COLING24.pdf) is an efficient transformer encoder LM for French, combining the LinFormer attention mechanism with the RoBERTa model architecture.
|
16 |
|
|
|
25 |
|-------------------------------------------------------------------------------------|:-----------------------:|:----------------:|
|
26 |
| [jargon-general-base](https://huggingface.co/PantagrueLLM/jargon-general-base) | scratch |8.5GB Web Corpus|
|
27 |
| [jargon-general-biomed](https://huggingface.co/PantagrueLLM/jargon-general-biomed) | jargon-general-base |5.4GB Medical Corpus|
|
28 |
+
| [jargon-general-legal](https://huggingface.co/PantagrueLLM/jargon-general-legal) (this model) | jargon-general-base |18GB Legal Corpus
|
29 |
| [jargon-multidomain-base](https://huggingface.co/PantagrueLLM/jargon-multidomain-base) | jargon-general-base |Medical+Legal Corpora|
|
30 |
+
| [jargon-legal](https://huggingface.co/PantagrueLLM/jargon-legal) | scratch |18GB Legal Corpus|
|
31 |
| [jargon-legal-4096](https://huggingface.co/PantagrueLLM/jargon-legal-4096) | scratch |18GB Legal Corpus|
|
32 |
| [jargon-biomed](https://huggingface.co/PantagrueLLM/jargon-biomed) | scratch |5.4GB Medical Corpus|
|
33 |
| [jargon-biomed-4096](https://huggingface.co/PantagrueLLM/jargon-biomed-4096) | scratch |5.4GB Medical Corpus|
|
|
|
58 |
|
59 |
## Using Jargon models with HuggingFace transformers
|
60 |
|
61 |
+
You can get started with this model using the code snippet below:
|
62 |
|
63 |
```python
|
64 |
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
|
65 |
|
66 |
+
tokenizer = AutoTokenizer.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
|
67 |
+
model = AutoModelForMaskedLM.from_pretrained("PantagrueLLM/jargon-general-legal", trust_remote_code=True)
|
68 |
|
69 |
jargon_maskfiller = pipeline("fill-mask", model=model, tokenizer=tokenizer)
|
70 |
output = jargon_maskfiller("Il est allé au <mask> hier")
|