Model Card: bert-base-multilingual-cased-finetuned-albanian-ner (Fine-Tuned with WikiANN)

Overview

Model Name: bert-base-multilingual-cased-finetuned-albanian-ner
Model Type: Named Entity Recognition (NER)
Language: Multilingual with focus on Albanian (Shqip)
Fine-Tuned with: WikiANN dataset

Description

The bert-base-multilingual-cased-finetuned-albanian-ner is a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model that has been fine-tuned for Named Entity Recognition (NER) in the Albanian language (Shqip). This model has been fine-tuned using the WikiANN dataset, which includes annotated named entities from various languages, including Albanian.

Named Entity Recognition is the task of identifying and classifying named entities in text, such as persons, organizations, locations, dates, and more. This model can be used to extract valuable information from Albanian text with a focus on NER.

Intended Use

The bert-base-multilingual-cased-finetuned-albanian-ner model, fine-tuned with the WikiANN dataset, is designed for Named Entity Recognition (NER) applications in Albanian text. It is particularly well-suited for identifying and classifying various types of named entities within Albanian language content, including the following categories:

Persons (PER): Recognizing individuals' names, both at the beginning and within their names.
Organizations (ORG): Identifying organization names, distinguishing between the beginning and inside of these names.
Locations (LOC): Recognizing location names, including both the beginning and interior of these names.
Miscellaneous (MISC): Handling miscellaneous entities or categories within text.

Labels

Label	Description
MISC	Miscellaneous entities or categories.
B-PER	Beginning of a person's name.
I-PER	Inside of a person's name.
B-ORG	Beginning of an organization name.
I-ORG	Inside of an organization name.
B-LOC	Beginning of a location name.
I-LOC	Inside of a location name.

Usage

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner")
model = AutoModelForTokenClassification.from_pretrained("Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner")

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first')
text = """ Unë, biri yt, Kosovë t'i njoh dëshirat e heshtura,  t'i njoh ëndrrat, erërat e fjetura me shekuj,  t'i njoh vuatjet, gëzimet, vdekjet,  t'i njoh lindjet e bardha, caqet e tuka të kulluara;  ta di gjakun që të vlon në gji,  dallgën kur të rrahë netëve t'pagjumta  e të shpërthej do si vullkan:-  më mirë se kushdo tjetër të njoh, Kosovë.  Unë biri yt. - Poezi nga Ali Podrimja """

results = ner(text)
pd.DataFrame.from_records(results)

@misc {kushtrim_visoka_2022,
    author       = { Kushtrim Visoka },
    title        = { bert-base-multilingual-cased-finetuned-albanian-ner (Revision 609fca2) },
    year         = 2022,
    url          = { https://huggingface.co/Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner },
    doi          = { 10.57967/hf/0006 },
    publisher    = { Hugging Face }
}

Kushtrim
/

bert-base-multilingual-cased-finetuned-albanian-ner

You need to agree to share your contact information to access this model

Model Card: bert-base-multilingual-cased-finetuned-albanian-ner (Fine-Tuned with WikiANN)

Overview

Description

Intended Use

Labels

Usage

Dataset used to train Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner

Collections including Kushtrim/bert-base-multilingual-cased-finetuned-albanian-ner

NER

BERT Models

Modele në shqip

Evaluation results