---
library_name: transformers
tags:
- Persian
- Named Entity Recognition
- NER
- Albert
---

# Model Card for Behpoyan-NER

Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the `HooshvareLab/albert-fa-zwnj-base-v2-ner` model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM).

## Model Details

### Model Description

Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, `HooshvareLab/albert-fa-zwnj-base-v2-ner`. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language.

- **Developed by:** Behpoyan  
- **Model type:** Albert for Token Classification  
- **Language(s) (NLP):** Persian (fa)  
- **License:** MIT  

### Model Sources

- **Repository:** [Behpoyan/Behpoyan-NER](https://huggingface.co/Behpoyan/Behpoyan-NER)  
- **Base Model Repository:** [HooshvareLab/albert-fa-zwnj-base-v2-ner](https://huggingface.co/HooshvareLab/albert-fa-zwnj-base-v2-ner)  


### Direct Use

This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications.

### Downstream Use

The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines.

### Out-of-Scope Use

The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged.

### Recommendations

While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities.

## How to Get Started with the Model

Here’s how you can use the model:

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("Behpouyan/Behpouyan-NER")
model = AutoModelForTokenClassification.from_pretrained("Behpouyan/Behpouyan-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

# Input example
example = '''
"در سال ۱۴۰۱، شرکت علی‌بابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساخت‌های تجارت الکترونیک در ایران آغاز خواهد کرد. 
این پروژه در تهران و اصفهان اجرا می‌شود و پیش‌بینی می‌شود تا پایان سال ۱۴۰۲ تکمیل شود."
'''
# Get NER results
ner_results = nlp(example)

# Function to merge subword entities
def merge_entities(entities):
    merged_results = []
    current_entity = None

    for entity in entities:
        if entity['entity'].startswith("B-") or current_entity is None:
            # Start a new entity
            if current_entity:
                merged_results.append(current_entity)
            current_entity = {
                "word": entity['word'].strip(),
                "entity": entity['entity'][2:],  # Remove "B-" prefix
                "score": entity['score'],
                "start": entity['start'],
                "end": entity['end'],
            }
        elif entity['entity'].startswith("I-") and current_entity:
            # Continue the current entity
            current_entity['word'] += entity['word'].strip()
            current_entity['score'] = min(current_entity['score'], entity['score'])  # Use the lowest score
            current_entity['end'] = entity['end']
    
    # Add the last entity if any
    if current_entity:
        merged_results.append(current_entity)

    return merged_results

# Merge the entities
merged_results = merge_entities(ner_results)

# Display the merged results
print("Named Entity Recognition Results:")
for entity in merged_results:
    print(f"- Entity: {entity['word']}")
    print(f"  Type: {entity['entity']}")
    print(f"  Score: {entity['score']:.2f}")
    print(f"  Start: {entity['start']}, End: {entity['end']}")
    print("-" * 40)