binbin83
/

fr_sensations_and_body

Token Classification

Model card Files Files and versions Community

fr_sensations_and_body / README.md

binbin83's picture

Update README.md

3a5b5ef about 1 year ago

|

history blame contribute delete

No virus

2.52 kB

	---
	tags:
	- spacy
	- token-classification
	language:
	- fr
	model-index:
	- name: fr_sensations_and_body
	results:
	- task:
	name: NER
	type: token-classification
	metrics:
	- name: NER Precision
	type: precision
	value: 0.8537117904
	- name: NER Recall
	type: recall
	value: 0.8555798687
	- name: NER F Score
	type: f_score
	value: 0.8546448087

	widget:
	- text: "Il y avait du sang partout, les bras et less jambes n'étaient plus aux bons endroits."
	example_title: "corps"
	- text: "J'étais un peu fatiguée."
	example_title: "Sensations physiques"
	- text: "Il y avait commme un silence assourdissant. Et là j'ai vu la beauté du lévé de soleil."
	example_title: "Perceptions"
	---

	This model was built to compute detect the lexical field of body, physical sensation and perception.
	It's main purpose was to automate annotation on a specific dataset.
	There is no waranty that it will work on any others dataset.
	We finetune, the camembert-base model using this code; https://github.com/psycholinguistics2125/train_NER.

	\| Feature \| Description \|
	\| --- \| --- \|
	\| Name \| `fr_sensations_and_body` \|
	\| Version \| `0.0.1` \|
	\| spaCy \| `>=3.4.4,<3.5.0` \|
	\| Default Pipeline \| `transformer`, `ner` \|
	\| Components \| `transformer`, `ner` \|
	\| Vectors \| 0 keys, 0 unique vectors (0 dimensions) \|
	\| Sources \| n/a \|
	\| License \| n/a \|
	\| Author \| [n/a]() \|

	### Label Scheme

	<details>

	<summary>View label scheme (4 labels for 1 components)</summary>

	\| Component \| Labels \|
	\| --- \| --- \|
	\| `ner` \| `CORPS`, `MOTS_PERCEPTIONS_SENSORIELLES`, `SENSATIONS_PHYSIQUES`, `VERB_PERCEPTIONS_SENSORIELLES` \|

	</details>

	### Accuracy

	\| Type \| Score \|
	\| --- \| --- \|
	\| `ENTS_F` \| 85.46 \|
	\| `ENTS_P` \| 85.37 \|
	\| `ENTS_R` \| 85.56 \|

	### Training

	We constructed our dataset by manually labeling the documents using Doccano, an open-source tool for collaborative human annotation.
	The models were trained using 200-word length sequences, 70% of the data were used for the training, 20% to test and finetune hyperparameters, and 10% to evaluate the performances of the model.
	In order to ensure correct performance evaluation, the evaluation sequences were taken from documents that were not used during the training.


	\| label \| train \| test \| valid \|
	\| --- \| --- \|--- \|--- \|
	\| `CORPS`\| 523 \| 152 \| 106 \|
	\| `MOTS_PERCEPTIONS_SENSORIELLES`\| 250 \| 108 \| 82 \|
	\| `SENSATIONS_PHYSIQUES`\|91 \| 38 \| 31\|
	\| `VERB_PERCEPTIONS_SENSORIELLES` \|617\|162 \| 137 \|