Behpouyan-NER / README.md
Behpouyan's picture
Upload AlbertForTokenClassification
341afd1 verified
|
raw
history blame
2.87 kB
---
library_name: transformers
tags:
- Persian
- Named Entity Recognition
- NER
- Albert
---
# Model Card for Behpoyan-NER
Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the `HooshvareLab/albert-fa-zwnj-base-v2-ner` model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM).
## Model Details
### Model Description
Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, `HooshvareLab/albert-fa-zwnj-base-v2-ner`. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language.
- **Developed by:** Behpoyan
- **Model type:** Albert for Token Classification
- **Language(s) (NLP):** Persian (fa)
- **License:** MIT
### Model Sources
- **Repository:** [Behpoyan/Behpoyan-NER](https://huggingface.co/Behpoyan/Behpoyan-NER)
- **Base Model Repository:** [HooshvareLab/albert-fa-zwnj-base-v2-ner](https://huggingface.co/HooshvareLab/albert-fa-zwnj-base-v2-ner)
### Direct Use
This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications.
### Downstream Use
The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines.
### Out-of-Scope Use
The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged.
### Recommendations
While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities.
## How to Get Started with the Model
Here’s how you can use the model:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_name = "Behpoyan/Behpoyan-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = '''
"در سال ۱۴۰۱، شرکت علی‌بابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساخت‌های تجارت الکترونیک در ایران آغاز خواهد کرد.
این پروژه در تهران و اصفهان اجرا می‌شود و پیش‌بینی می‌شود تا پایان سال ۱۴۰۲ تکمیل شود."
'''
ner_results = nlp(example)
print(ner_results)