Post
The most widely used French NER models on HF (
Jean-Baptiste/camembert-ner and
cmarkea/distilcamembert-base-ner) are trained on a single dataset (WikiNER) which on the one hand contains leaks and therefore distorts the true results of these models, and on the other hand overspecializes them in a particular domain (= texts from Wikipedia). They are also only available in a base version (110M parameters).
That's why I've trained new NER models in French both on more data (x3), as well as in base and large versions (336M). They are available in 3 entities (PER, ORG, LOC) or 4 entities (PER, ORG, LOC, MISC):
- CATIE-AQ/NERmembert-base-4entities
- CATIE-AQ/NERmembert-large-4entities
- CATIE-AQ/NERmembert-base-3entities
- CATIE-AQ/NERmembert-large-3entities
Datasets without leaks are also available:
- CATIE-AQ/frenchNER_4entities
- CATIE-AQ/frenchNER_3entities
That's why I've trained new NER models in French both on more data (x3), as well as in base and large versions (336M). They are available in 3 entities (PER, ORG, LOC) or 4 entities (PER, ORG, LOC, MISC):
- CATIE-AQ/NERmembert-base-4entities
- CATIE-AQ/NERmembert-large-4entities
- CATIE-AQ/NERmembert-base-3entities
- CATIE-AQ/NERmembert-large-3entities
Datasets without leaks are also available:
- CATIE-AQ/frenchNER_4entities
- CATIE-AQ/frenchNER_3entities