Pavarissy
/

mdeberta-v3-ud-thai-pud-upos

Token Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Pavarissy commited on Nov 2, 2023

Commit

5c82642

•

1 Parent(s): 7b5846d

Update README.md

Files changed (1) hide show

README.md +17 -8

README.md CHANGED Viewed

@@ -7,6 +7,8 @@ datasets:
 - universal_dependencies
 metrics:
 - accuracy
 model-index:
 - name: mdeberta-v3-ud-thai-pud-upos
   results:
@@ -23,6 +25,9 @@ model-index:
     - name: Accuracy
       type: accuracy
       value: 0.9934846474601972
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -43,17 +48,21 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -87,4 +96,4 @@ The following hyperparameters were used during training:
 - Transformers 4.34.1
 - Pytorch 2.1.0+cu118
 - Datasets 2.14.6
-- Tokenizers 0.14.1

 - universal_dependencies
 metrics:
 - accuracy
+- precision
+- recall
 model-index:
 - name: mdeberta-v3-ud-thai-pud-upos
   results:
     - name: Accuracy
       type: accuracy
       value: 0.9934846474601972
+language:
+- th
+library_name: transformers
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language.
+## Example
+```python
+from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline
+model = AutoModelForTokenClassification.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos")
+tokenizer = AutoTokenizer.from_pretrained("Pavarissy/mdeberta-v3-ud-thai-pud-upos")
+pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True)
+outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย")
+print(outputs)
+# [{'entity_group': 'PROPN', 'score': 0.9946701, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.85809743, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.99632, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.9961184, 'word': 'เอเชีย', 'start': 21, 'end': 28}]
+```
 ### Training hyperparameters
 - Transformers 4.34.1
 - Pytorch 2.1.0+cu118
 - Datasets 2.14.6
+- Tokenizers 0.14.1