Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card: norbert3-large-ner (Fine-Tuned with WikiANN & norne)

Overview

  • Model Name: Kushtrim/norbert3-large-ner
  • Model Type: Named Entity Recognition (NER)
  • Language: Multilingual with focus on Norwegian (Norsk)
  • Fine-Tuned with: WikiANN & norne datasets

Description

The Kushtrim/norbert3-large-ner is a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model that has been fine-tuned on ltg/norbert3-large[^1] for Named Entity Recognition (NER) in the Norwegian language (Norsk). This model has been fine-tuned using the WikiANN & norne datasets, which includes annotated named entities from various languages, including Norwegian.

Named Entity Recognition is the task of identifying and classifying named entities in text, such as persons, organizations, locations, dates, and more. This model can be used to extract valuable information from Norwegian text with a focus on NER.

Intended Use

The Kushtrim/norbert3-large-ner model, fine-tuned with the WikiANN & norne datasets, is designed for Named Entity Recognition (NER) applications in Norwegian text. It is particularly well-suited for identifying and classifying various types of named entities within Norwegian language content, including the following categories:

  • Persons (PER): Recognizing individuals' names, both at the beginning and within their names.
  • Organizations (ORG): Identifying organization names, distinguishing between the beginning and inside of these names.
  • Locations (LOC): Recognizing location names, including both the beginning and interior of these names.
  • Miscellaneous (MISC): Handling miscellaneous entities or categories within text.

Labels

Label Description
Person (PER) Real or fictional characters and animals
Organization (ORG) Any collection of people, such as firms, institutions, organizations, music groups, sports teams, unions, political parties etc.
Location (LOC) Geographical places, buildings and facilities
Geo-political entity (GPE) Geographical regions defined by political and/or social groups. A GPE entity subsumes and does not distinguish between a nation, its region, its government, or its people.
Product (PROD) Artificially produced entities are regarded products. This may include more abstract entities, such as speeches, radio shows, programming languages, contracts, laws and ideas.
Event (EVT) Festivals, cultural events, sports events, weather phenomena, wars, etc. Events are bounded in time and space.
Derived (DRV) Words (and phrases?) that are dervied from a name, but not a name in themselves. They typically contain a full name and are capitalized, but are not proper nouns. Examples (fictive) are "Brann-treneren" ("the Brann coach") or "Oslo-mannen" ("the man from Oslo").
Miscellaneous (MISC) Names that do not belong in the other categories. Examples are animals species and names of medical conditions. Entities that are manufactured or produced are of type Products, whereas thing naturally or spontaneously occurring are of type Miscellaneous.

Source of label information: norne

Usage

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/norbert3-large-ner", trust_remote_code=True)
model = AutoModelForTokenClassification.from_pretrained("Kushtrim/norbert3-large-ner", trust_remote_code=True)

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first')

text = "Sett inn tekst her"

results = ner(text)

pd.DataFrame.from_records(results)

[^1]: Samuel, D., Kutuzov, A., Touileb, S., Velldal, E., Øvrelid, L., Rønningstad, E., Sigdel, E., & Palatkina, A. (2023). NorBench -- A Benchmark for Norwegian Language Models. In Editor(s) of the Conference (Ed.), Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 618-633. University of Tartu Library. URL

Downloads last month
13
Safetensors
Model size
361M params
Tensor type
I64
·
F32
·
Inference API (serverless) does not yet support model repos that contain custom code.

Finetuned from

Datasets used to train Kushtrim/norbert3-large-ner

Collections including Kushtrim/norbert3-large-ner