Sindhi Sentiment Analysis Model
A text classification model that detects positive, negative, and neutral sentiment in Sindhi language text. This is one of the first publicly available sentiment analysis models for the Sindhi language on Hugging Face.
Model Description
This model was trained on a custom Sindhi sentiment dataset collected from Sindhi newspaper corpora. It classifies Sindhi text into three sentiment categories:
- โ Positive
- โ Negative
- ๐ Neutral
Model Details
| Property | Details |
|---|---|
| Language | Sindhi (sd) |
| Script | Arabic (Nastaliq) |
| Task | Sentiment Analysis / Text Classification |
| Labels | Positive, Negative, Neutral |
| License | MIT |
| Developer | Ali Nawaz |
| Institution | Shaikh Ayaz University |
Training Data
Trained on the Sindhi Sentiment Analysis Dataset โ a dataset of 1,898 sentences in Sindhi collected from Sindhi newspaper corpora using a semi-supervised pipeline, with manual verification.
| Column | Description |
|---|---|
| Sindhi Text | Original Sindhi sentence |
| English Translation | English translation |
| Sentiment | Label: Positive / Negative / Neutral |
| Source | Newspaper/corpus source |
| Verified | Manual verification status |
How to Use
from transformers import pipeline
classifier = pipeline("text-classification", model="alinawazmahar/sindhi-sentiment")
result = classifier("ูู ฺชุชุงุจ ุชู
ุงู
ุณูบู ุขูู")
print(result)
# [{'label': 'Positive', 'score': 0.95}]
Or load manually:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("alinawazmahar/sindhi-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("alinawazmahar/sindhi-sentiment")
Live Demo
Try the model interactively on the Hugging Face Space:
๐ alinawazmahar/sindhi-sentiment (Space)
Intended Use
- Sentiment analysis of Sindhi news articles
- Social media monitoring in Sindhi
- NLP research on low-resource South Asian languages
- Educational and academic research
Limitations
- Trained on newspaper text; may perform differently on informal/social media Sindhi
- Dataset size is relatively small (1,898 sentences)
- Roman Sindhi (Latin script) is not supported โ Arabic script only
Citation
If you use this model or dataset in your research, please cite:
@misc{alinawaz2025sindhi,
author = {Ali Nawaz},
title = {Sindhi Sentiment Analysis Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/alinawazmahar/sindhi-sentiment},
institution = {Shaikh Ayaz University}
}
Acknowledgements
Dataset collected from Sindhi newspaper corpora. Developed as part of NLP research at Shaikh Ayaz University.