Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,8 @@ datasets:
|
|
7 |
- universal_dependencies
|
8 |
metrics:
|
9 |
- accuracy
|
|
|
|
|
10 |
model-index:
|
11 |
- name: mdeberta-v3-ud-thai-pud-upos
|
12 |
results:
|
@@ -23,6 +25,9 @@ model-index:
|
|
23 |
- name: Accuracy
|
24 |
type: accuracy
|
25 |
value: 0.9934846474601972
|
|
|
|
|
|
|
26 |
---
|
27 |
|
28 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -43,17 +48,21 @@ It achieves the following results on the evaluation set:
|
|
43 |
|
44 |
## Model description
|
45 |
|
46 |
-
|
47 |
|
48 |
-
##
|
|
|
|
|
49 |
|
50 |
-
|
|
|
51 |
|
52 |
-
|
|
|
|
|
|
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
## Training procedure
|
57 |
|
58 |
### Training hyperparameters
|
59 |
|
@@ -87,4 +96,4 @@ The following hyperparameters were used during training:
|
|
87 |
- Transformers 4.34.1
|
88 |
- Pytorch 2.1.0+cu118
|
89 |
- Datasets 2.14.6
|
90 |
-
- Tokenizers 0.14.1
|
|
|
7 |
- universal_dependencies
|
8 |
metrics:
|
9 |
- accuracy
|
10 |
+
- precision
|
11 |
+
- recall
|
12 |
model-index:
|
13 |
- name: mdeberta-v3-ud-thai-pud-upos
|
14 |
results:
|
|
|
25 |
- name: Accuracy
|
26 |
type: accuracy
|
27 |
value: 0.9934846474601972
|
28 |
+
language:
|
29 |
+
- th
|
30 |
+
library_name: transformers
|
31 |
---
|
32 |
|
33 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
48 |
|
49 |
## Model description
|
50 |
|
51 |
+
This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language.
|
52 |
|
53 |
+
## Example
|
54 |
+
```python
|
55 |
+
from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline
|
56 |
|
57 |
+
model = AutoModelForTokenClassification.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos")
|
58 |
+
tokenizer = AutoTokenizer.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos")
|
59 |
|
60 |
+
pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True)
|
61 |
+
outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย")
|
62 |
+
print(outputs)
|
63 |
+
# [{'entity_group': 'PROPN', 'score': 0.9946701, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.85809743, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.99632, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.9961184, 'word': 'เอเชีย', 'start': 21, 'end': 28}]
|
64 |
|
65 |
+
```
|
|
|
|
|
66 |
|
67 |
### Training hyperparameters
|
68 |
|
|
|
96 |
- Transformers 4.34.1
|
97 |
- Pytorch 2.1.0+cu118
|
98 |
- Datasets 2.14.6
|
99 |
+
- Tokenizers 0.14.1
|