SaraikiNLP

community

AI & ML interests

NLP for Saraiki Language.

Recent Activity

MMuzamilAI  updated a Space 18 days ago
SaraikiNLP/README
MMuzamilAI  published a Space 28 days ago
SaraikiNLP/README
View all activity

SaraikiNLP: The Python NLP Framework for Saraiki Language Research

Saraiki NLP logo with stylized script and the letters 'NLP' below, framed with traditional decorative borders.

Table of Contents

  1. Introduction
  2. Why Saraiki?
  3. 🎁 Resources
  4. Intended Audience
  5. Features
  6. Installation
  7. Documentation
  8. Contributing
  9. License
Saraiki Ajrak Border

Introduction

SaraikiNLP is the world's first and fundamental Python framework for Saraiki language research. It comes with basic NLP functions to help researchers, students, and anyone interested in Saraiki NLP work.

Saraiki Ajrak Border

Why Saraiki?

About the Saraiki Language

Saraiki (skr) is an Indo-Aryan language spoken by over 25–30 million people, primarily in Pakistan. It is spoken in Southern Punjab in cities like Multan, Bahawalpur, Dera Ghazi Khan, and Rahim Yar Khan, and extends into parts of Sindh, Balochistan, and Khyber Pakhtunkhwa. Despite its rich linguistic and cultural heritage, Saraiki remains underrepresented in NLP and computational research.

Why Should One Work on Saraiki NLP?

Despite having millions of speakers, Saraiki still lacks digital resources, computational tools, and NLP support. Major challenges include:

  • Limited Text Datasets – Few machine-readable corpora are available.
  • Minimal NLP Research – There is less work on tokenization, lemmatization, named entity recognition (NER), or speech recognition.
  • No Official Status – Unlike Urdu or Punjabi, Saraiki lacks mainstream software support in AI and ML applications.

The Goal of SaraikiNLP

The SaraikiNLP project aims to:

  • Developing Core NLP Tools – Implement normalization, tokenization, lemmatization, stemming, and NER models.
  • Promote Saraiki Research – Increase awareness and support for research in this low resource language.
Saraiki Ajrak Border

Resources 🎁

Link Description
Saraiki Alphabets Saraiki alphabets.
Saraiki Counting Saraiki counting.
Saraiki Months Saraiki month names with Urdu and English counterparts.
Saraiki Week Days Saraiki day names with counterparts.
Current Saraiki Research Current research being done on Saraiki language.
Saraiki Ajrak Border

Intended Audience

  • Science/Research: For researchers and linguists experimenting with Saraiki language data.
  • Developers: To help build Saraiki applications with less effort.
  • Education: Providing resources and tools for students interested in Saraiki language research.
Saraiki Ajrak Border

Features

  • ✔️ Normalization
  • ✔️ Preprocessing
  • 🚧 Tokenization
  • 🚧 Stemming
  • 🚧 Lemmatization
Saraiki Ajrak Border

Installation

Using pip, SaraikiNLP can be easily installed:

pip install SaraikiNLP
Saraiki Ajrak Border

Documentation

Link Description Free Notebook (Colab)
Normalization Our functions and usage examples for normalization. ▶️ Start Now
Preprocessing Functions SaraikiNLP provides for preprocessing. ▶️ Start Now
🚧 Tokenization Functions for tokenization. ⌛ Coming Soon
🚧 Stemming Functions for stemming. ⌛ Coming Soon
🚧 Lemmatization Functions for lemmatization. ⌛ Coming Soon
Saraiki Ajrak Border

Contributing

We welcome contributions from everyone! If you'd like to help improve SaraikiNLP OR if you notice any mistakes or have suggestions for improvement, feel free to:

Your contributions are highly appreciated and make SaraikiNLP better for everyone. Thank you for your interest! :-)

Saraiki Ajrak Border

Copyright and license

Code released under the MIT License.

models

None public yet

datasets

None public yet