marzinouri's picture
Update README.md
6171a3c
metadata
language:
  - az
  - fa
pipeline_tag: fill-mask
metrics:
  - accuracy
  - f1
  - perplexity
  - sacrebleu
tags:
  - text-classification
  - token-classification
  - translation
  - feature-extraction

Iranian Azerbaijani NLP Models

Github Repository: iranian-azerbaijani-nlp

Overview

This model card provides information about the NLP models developed as part of the paper accepted for publication at AACL 2023. The models are designed to support Natural Language Processing (NLP) tasks for the Iranian Azerbaijani language (ISO code: azb). The models included in this repository are:

  1. AzerBERT

    • Type: BERT-based language model transformer
    • Description: AzerBERT is a pre-trained language model specifically tailored for the Iranian Azerbaijani language. It can be used for various NLP tasks, including text classification, named entity recognition, and more.
    • Model Link: AzerBERT Model
  2. Language Model-based Embedding (FastText)

    • Type: FastText-based word embedding model
    • Description: This model provides embeddings for Iranian Azerbaijani text using the FastText framework. It allows you to generate word embeddings for Iranian Azerbaijani words and phrases.
    • Model Link: FastText Embedding Model
  3. Text Classification Model (Fine-tuned with AzerBERT)

    • Type: Fine-tuned BERT-based text classification model
    • Description: This model has been fine-tuned using AzerBERT for text classification tasks. It is designed to categorize text into one of the following four categories: literature, sports, history, and geography.
    • Model Link: Text Classification Model
  4. POS Tagger (Fine-tuned with AzerBERT)

    • Type: Fine-tuned BERT-based Part-of-Speech (POS) tagging model
    • Description: This model has been fine-tuned using AzerBERT for part-of-speech tagging tasks in Iranian Azerbaijani text. It can be used to annotate text with 11 POS tags, which is essential for various downstream NLP applications.
    • Model Link: POS Tagger Model
  5. Translation Models (Persian to Azerbaijani and Vice Versa)

    • Type: Machine translation models
    • Description: These models support translation between Persian (fa) and Iranian Azerbaijani (azb) languages. They enable bidirectional translation between the two languages, making them valuable for cross-language communication.
    • Model Link: Translation Models

Model Training Data

The details about the training data used to pre-train and fine-tune these models can be found in the associated research paper. Please refer to the paper for comprehensive information about the data sources and preprocessing steps.

Model Performance Summary

The following table provides a summary of the model's performance on various tasks. Performance metrics are reported for each task.

Task Model Evaluation Metric Performance
Language model-based Embedding FastText MRR 0.46
Language Model BERT Perplexity 48.05
Text Classification TF-IDF + SVM Accuracy 0.79
TF-IDF + SVM F1-score 0.78
FastText + SVM Accuracy 0.86
FastText + SVM F1-score 0.86
BERT Accuracy 0.89
BERT F1-score 0.89
Token Classification BERT POS-tagger Accuracy 0.86
BERT POS-tagger Macro F1-score 0.67
Machine Translation Text Translation azb2fa SacreBLEU 10.34
Text Translation fa2azb SacreBLEU 8.07

Acknowledgments

Please acknowledge the authors and cite the associated research paper when using these models in your work. Proper attribution helps recognize the effort and contributions of the researchers involved in model development.

Citation

If you use these models in your research or applications, please cite the following paper:

@inproceedings{azbpipeline,
    title = "The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani",
    author = "Marzia, Nouri and
                  Mahsa, Amani and
                  Reihaneh, Zohrabi and
                  Asgari, Ehsaneddin",
    booktitle = "Findings of the Association for Computational Linguistics: AACL-IJCNLP 2023",
    month = nov,
    year = "2023",
    address = "",
    publisher = "Association for Computational Linguistics",
    url = "",
    pages = "",
    abstract = "",
}

Contact Information

For questions, issues, or inquiries related to these models, please contact inquiries[AT]language.ml, marziehnouri1999[AT]gmail.com, or mahsa.ama1391[AT]gmail.com