binbin83
/

fr_on_value

Token Classification

Model card Files Files and versions Community

fr_on_value / README.md

binbin83's picture

Update README.md

5e4cd86 9 months ago

|

raw history blame contribute delete

No virus

2.9 kB

	---
	tags:
	- spacy
	- token-classification
	language:
	- fr
	model-index:
	- name: fr_on_value
	results:
	- task:
	name: NER
	type: token-classification
	metrics:
	- name: NER Precision
	type: precision
	value: 0.8539823009
	- name: NER Recall
	type: recall
	value: 0.9234449761
	- name: NER F Score
	type: f_score
	value: 0.8873563218
	widget:
	- text: "On m'a attrapé par la main !"
	example_title: "on quelqu'un"
	- text: "En France, on parle français."
	example_title: "on générique"
	- text: "On est allé manger des glaces puis on est allé à la plage."
	example_title: "on nous"

	license: agpl-3.0
	---

	## Description

	This model was built to compute detect diffferent value of on in French (them). It's main purpose was to automate annotation on a specific dataset.
	There is no waranty that it will work on any others dataset.
	We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.
	Some pronouns can have different meanings according to their context, the generic pronoun plays an important role in trauma narratives.
	In our study, we differentiate the different values of the on pronoun. It can be used as we, for example: “On est entré au Bataclan à 20h45” ("We entered the Bataclan at 8:45 pm").
	But it can also be used as a synonym for someone: “On m’a marché dessus” (“Someone stepped on me").
	Finally, it can be used generically: “on est jamais mieux servi que par que par soi même” ("you are never better served than by yourself".)

	---
	\| Feature \| Description \|
	\| --- \| --- \|
	\| Name \| `fr_on_value` \|
	\| Version \| `0.0.1` \|
	\| spaCy \| `>=3.4.4,<3.5.0` \|
	\| Default Pipeline \| `transformer`, `ner` \|
	\| Components \| `transformer`, `ner` \|
	\| Vectors \| 0 keys, 0 unique vectors (0 dimensions) \|
	\| Sources \| n/a \|
	\| License \| agpl-3.0 \|
	\| Author \| [n/a]() \|

	### Label Scheme

	<details>

	<summary>View label scheme (3 labels for 1 components)</summary>

	\| Component \| Labels \|
	\| --- \| --- \|
	\| `ner` \| `ON_GENERIQUE`, `ON_NOUS`, `ON_QUELQU_UN` \|

	</details>

	### Accuracy

	\| Type \| Score \|
	\| --- \| --- \|
	\| `ENTS_F` \| 88.74 \|
	\| `ENTS_P` \| 85.40 \|
	\| `ENTS_R` \| 92.34 \|


	### training

	We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
	The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
	In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.

	\| label \| train \| test \| valid \|
	\| --- \| --- \|--- \|--- \|
	\| `ON_GENERIQUE`\| 189 \| 57 \| 49 \|
	\| `ON_NOUS`\| 1006 \| 320 \| 229 \|
	\| `ON_QUELQU_UN`\|90 \| 42 \| 19\|