iapp/iapp_wiki_qa_squad
Updated • 506 • 7
A multi-task NLP framework for the Thai language built from scratch with PyTorch.
Uses a shared Transformer encoder backbone with three task-specific heads:
| Task | Head | Metric |
|---|---|---|
| Named Entity Recognition | Token classification (7 labels) | Entity-level F1 |
| Sentiment Analysis | Sentence classification (3 labels) | Macro-F1 |
| Question Answering | Extractive span prediction | EM / F1 |
# Clone the repository first
# git clone https://github.com/puttibenz/thai-nlp-toolkit.git
from inference.pipeline import ThaiNLPPipeline
pipeline = ThaiNLPPipeline(model_dir="path/to/downloaded/model", device="auto")
# NER
result = pipeline.predict("สมชายทำงานที่กรุงเทพ", task="ner")
# Sentiment Analysis
result = pipeline.predict("อาหารอร่อยมากครับ", task="sentiment")
# Question Answering
result = pipeline.predict(
"กรุงเทพมหานครเป็นเมืองหลวงของประเทศไทย",
task="qa",
question="เมืองหลวงของประเทศไทยคืออะไร"
)
| Dataset | Task | Source |
|---|---|---|
| ThaiNER v2.2 | NER | pythainlp/thainer-corpus-v2.2 |
| Wisesight Sentiment | Sentiment | pythainlp/wisesight_sentiment |
| iApp Thai Wiki QA | QA | iapp_wiki_qa_squad |
thai-nlp-toolkit/
├── checkpoint.pt # Model weights
├── config.yaml # Model architecture config
└── tokenizer/
├── thai_bpe.model # SentencePiece BPE model
└── tokenizer_config.json # Tokenizer config
GitHub: puttibenz/thai-nlp-toolkit
MIT