--- library_name: transformers tags: - Persian - Named Entity Recognition - NER - Albert --- # Model Card for Behpoyan-NER Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the `HooshvareLab/albert-fa-zwnj-base-v2-ner` model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM). ## Model Details ### Model Description Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, `HooshvareLab/albert-fa-zwnj-base-v2-ner`. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language. - **Developed by:** Behpoyan - **Model type:** Albert for Token Classification - **Language(s) (NLP):** Persian (fa) - **License:** MIT ### Model Sources - **Repository:** [Behpoyan/Behpoyan-NER](https://huggingface.co/Behpoyan/Behpoyan-NER) - **Base Model Repository:** [HooshvareLab/albert-fa-zwnj-base-v2-ner](https://huggingface.co/HooshvareLab/albert-fa-zwnj-base-v2-ner) ### Direct Use This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications. ### Downstream Use The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines. ### Out-of-Scope Use The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged. ### Recommendations While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities. ## How to Get Started with the Model Here’s how you can use the model: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("Behpouyan/Behpouyan-NER") model = AutoModelForTokenClassification.from_pretrained("Behpouyan/Behpouyan-NER") nlp = pipeline("ner", model=model, tokenizer=tokenizer) # Input example example = ''' "در سال ۱۴۰۱، شرکت علی‌بابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساخت‌های تجارت الکترونیک در ایران آغاز خواهد کرد. این پروژه در تهران و اصفهان اجرا می‌شود و پیش‌بینی می‌شود تا پایان سال ۱۴۰۲ تکمیل شود." ''' # Get NER results ner_results = nlp(example) # Function to merge subword entities def merge_entities(entities): merged_results = [] current_entity = None for entity in entities: if entity['entity'].startswith("B-") or current_entity is None: # Start a new entity if current_entity: merged_results.append(current_entity) current_entity = { "word": entity['word'].strip(), "entity": entity['entity'][2:], # Remove "B-" prefix "score": entity['score'], "start": entity['start'], "end": entity['end'], } elif entity['entity'].startswith("I-") and current_entity: # Continue the current entity current_entity['word'] += entity['word'].strip() current_entity['score'] = min(current_entity['score'], entity['score']) # Use the lowest score current_entity['end'] = entity['end'] # Add the last entity if any if current_entity: merged_results.append(current_entity) return merged_results # Merge the entities merged_results = merge_entities(ner_results) # Display the merged results print("Named Entity Recognition Results:") for entity in merged_results: print(f"- Entity: {entity['word']}") print(f" Type: {entity['entity']}") print(f" Score: {entity['score']:.2f}") print(f" Start: {entity['start']}, End: {entity['end']}") print("-" * 40)