metadata

library_name: transformers
license: cc-by-nc-4.0
language:
  - az
pipeline_tag: token-classification
tags:
  - NER
  - Named Entity Recognition
widget:
  - text: >-
      İyunun 11-i saat 20:55 radələrində Oğuz rayonu Tayıflı, Şirvanlı, Xalxal
      kəndlərinə diametri 10 mm olan dolu düşüb.
datasets:
  - LocalDoc/azerbaijani-ner-dataset

Azerbaijani Named Entity Recognition (NER) Model

This repository contains the code and model for Named Entity Recognition (NER) in Azerbaijani language. The model is built using the XLM-RoBERTa architecture and fine-tuned on a custom dataset.

Model Description

The model recognizes the following entity types:

LABEL_0: O: Outside any named entity
LABEL_1: PERSON: Names of individuals
LABEL_2 :LOCATION: Geographical locations, both man-made and natural
LABEL_3 :ORGANISATION: Names of companies, institutions
LABEL_4 :DATE: Dates or periods
LABEL_5 :TIME: Times of the day
LABEL_6 :MONEY: Monetary values
LABEL_7 :PERCENTAGE: Percentage values
LABEL_8 :FACILITY: Buildings, airports, etc.
LABEL_9 :PRODUCT: Products and goods
LABEL_10 :EVENT: Events and occurrences
LABEL_11 :ART: Artworks, titles of books, songs
LABEL_12 :LAW: Legal documents
LABEL_13 :LANGUAGE: Languages
LABEL_14 :GPE: Countries, cities, states
LABEL_15 :NORP: Nationalities or religious or political groups
LABEL_16 :ORDINAL: Ordinal numbers
LABEL_17 :CARDINAL: Cardinal numbers
LABEL_18 :DISEASE: Diseases and medical conditions
LABEL_19 :CONTACT: Contact information, e.g., phone numbers, emails
LABEL_20 :ADAGE: Proverbs, sayings
LABEL_21 :QUANTITY: Measurements and quantities
LABEL_22 :MISCELLANEOUS: Miscellaneous entities
LABEL_23 :POSITION: Professional or social positions
LABEL_24 :PROJECT: Names of projects or programs

Installation

To use the model, you need to install the required libraries. You can do this using pip:

pip install transformers
pip install datasets

from transformers import pipeline, XLMRobertaTokenizerFast, XLMRobertaForTokenClassification

# Load the model and tokenizer
tokenizer = XLMRobertaTokenizerFast.from_pretrained("LocalDoc/ner_azerbaijan")
model = XLMRobertaForTokenClassification.from_pretrained("LocalDoc/ner_azerbaijan")

# Create NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

# Example text
example = "Komitədən bildirilib ki, sovet dövründə Azərbaycanda cəmi 17 məscid fəaliyyət göstərirdisə, dövlət müstəqilliyinin bərpasından sonra ölkədə 814 məscid tikilib."

# Perform NER
ner_results = nlp(example)

# Mapping of label indices to their descriptions
label_mapping = {
    0: "O",
    1: "PERSON",
    2: "LOCATION",
    3: "ORGANISATION",
    4: "DATE",
    5: "TIME",
    6: "MONEY",
    7: "PERCENTAGE",
    8: "FACILITY",
    9: "PRODUCT",
    10: "EVENT",
    11: "ART",
    12: "LAW",
    13: "LANGUAGE",
    14: "GPE",
    15: "NORP",
    16: "ORDINAL",
    17: "CARDINAL",
    18: "DISEASE",
    19: "CONTACT",
    20: "ADAGE",
    21: "QUANTITY",
    22: "MISCELLANEOUS",
    23: "POSITION",
    24: "PROJECT"
}

# Print results with mapped entity types
for result in ner_results:
    entity_group = result['entity_group']
    entity_description = label_mapping[int(entity_group.split('_')[-1])]
    print({
        'entity_group': entity_description,
        'score': result['score'],
        'word': result['word'],
        'start': result['start'],
        'end': result['end']
    })

License

This model licensed under the CC BY-NC-ND 4.0 license. What does this license allow?

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
Non-Commercial: You may not use the material for commercial purposes.
No Derivatives: If you remix, transform, or build upon the material, you may not distribute the modified material.

For more information, please refer to the CC BY-NC-ND 4.0 license.

Contact

For more information, questions, or issues, please contact LocalDoc at [v.resad.89@gmail.com].