
SaraikiNLP
AI & ML interests
NLP for Saraiki Language.
Recent Activity
SaraikiNLP: The Python NLP Framework for Saraiki Language Research

Table of Contents
- Introduction
- Why Saraiki?
- 🎁 Resources
- Intended Audience
- Features
- Installation
- Documentation
- Contributing
- License

Introduction
SaraikiNLP is the world's first and fundamental Python framework for Saraiki language research. It comes with basic NLP functions to help researchers, students, and anyone interested in Saraiki NLP work.

Why Saraiki?
About the Saraiki Language
Saraiki (skr) is an Indo-Aryan language spoken by over 25–30 million people, primarily in Pakistan. It is spoken in Southern Punjab in cities like Multan, Bahawalpur, Dera Ghazi Khan, and Rahim Yar Khan, and extends into parts of Sindh, Balochistan, and Khyber Pakhtunkhwa. Despite its rich linguistic and cultural heritage, Saraiki remains underrepresented in NLP and computational research.
Why Should One Work on Saraiki NLP?
Despite having millions of speakers, Saraiki still lacks digital resources, computational tools, and NLP support. Major challenges include:
- Limited Text Datasets – Few machine-readable corpora are available.
- Minimal NLP Research – There is less work on tokenization, lemmatization, named entity recognition (NER), or speech recognition.
- No Official Status – Unlike Urdu or Punjabi, Saraiki lacks mainstream software support in AI and ML applications.
The Goal of SaraikiNLP
The SaraikiNLP project aims to:
- Developing Core NLP Tools – Implement normalization, tokenization, lemmatization, stemming, and NER models.
- Promote Saraiki Research – Increase awareness and support for research in this low resource language.

Resources 🎁
Link | Description |
---|---|
Saraiki Alphabets | Saraiki alphabets. |
Saraiki Counting | Saraiki counting. |
Saraiki Months | Saraiki month names with Urdu and English counterparts. |
Saraiki Week Days | Saraiki day names with counterparts. |
Current Saraiki Research | Current research being done on Saraiki language. |

Intended Audience
- Science/Research: For researchers and linguists experimenting with Saraiki language data.
- Developers: To help build Saraiki applications with less effort.
- Education: Providing resources and tools for students interested in Saraiki language research.

Features
- ✔️ Normalization
- ✔️ Preprocessing
- 🚧 Tokenization
- 🚧 Stemming
- 🚧 Lemmatization

Installation
Using pip, SaraikiNLP can be easily installed:
pip install SaraikiNLP

Documentation
Link | Description | Free Notebook (Colab) |
---|---|---|
Normalization | Our functions and usage examples for normalization. | ▶️ Start Now |
Preprocessing | Functions SaraikiNLP provides for preprocessing. | ▶️ Start Now |
🚧 Tokenization | Functions for tokenization. | ⌛ Coming Soon |
🚧 Stemming | Functions for stemming. | ⌛ Coming Soon |
🚧 Lemmatization | Functions for lemmatization. | ⌛ Coming Soon |

Contributing
We welcome contributions from everyone! If you'd like to help improve SaraikiNLP OR if you notice any mistakes or have suggestions for improvement, feel free to:
- Open an issue
- Submit a pull request
Your contributions are highly appreciated and make SaraikiNLP better for everyone. Thank you for your interest! :-)

Copyright and license
Code released under the MIT License.