metadata
language:
- az
license: mit
tags:
- token-classification
- ner
- azerbaijani
- fastapi
- transformers
- xlm-roberta
pipeline_tag: token-classification
datasets:
- LocalDoc/azerbaijani-ner-dataset
Named Entity Recognition for Azerbaijani Language
A state-of-the-art Named Entity Recognition (NER) system specifically designed for the Azerbaijani language, featuring multiple fine-tuned transformer models and a production-ready FastAPI deployment with an intuitive web interface.
π Live Demo
Try the live demo: Named Entity Recognition Demo
Note: The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup.
ποΈ System Architecture
graph TD
A[User Input] --> B[FastAPI Server]
B --> C[XLM-RoBERTa Model]
C --> D[Token Classification]
D --> E[Entity Aggregation]
E --> F[Label Mapping]
F --> G[JSON Response]
G --> H[Frontend Visualization]
subgraph "Model Pipeline"
C --> C1[Tokenization]
C1 --> C2[BERT Encoding]
C2 --> C3[Classification Head]
C3 --> D
end
subgraph "Entity Categories"
I[Person]
J[Location]
K[Organization]
L[Date/Time]
M[Government]
N[25 Total Categories]
end
F --> I
F --> J
F --> K
F --> L
F --> M
F --> N
π€ Model Training Pipeline
flowchart LR
A[Azerbaijani NER Dataset] --> B[Data Preprocessing]
B --> C[Tokenization]
C --> D[Label Alignment]
subgraph "Model Training"
E[mBERT] --> F[Fine-tuning]
G[XLM-RoBERTa] --> F
H[XLM-RoBERTa Large] --> F
I[Azeri-Turkish BERT] --> F
F --> J[Model Evaluation]
end
D --> E
D --> G
D --> H
D --> I
J --> K[Best Model Selection]
K --> L[Hugging Face Hub]
L --> M[Production Deployment]
subgraph "Performance Metrics"
N[Precision: 76.44%]
O[Recall: 74.05%]
P[F1-Score: 75.22%]
end
J --> N
J --> O
J --> P
π Data Flow Architecture
sequenceDiagram
participant U as User
participant F as Frontend
participant API as FastAPI
participant M as XLM-RoBERTa
participant HF as Hugging Face
U->>F: Enter Azerbaijani text
F->>API: POST /predict/
API->>M: Process text
M->>M: Tokenize input
M->>M: Generate predictions
M->>API: Return entity predictions
API->>API: Apply label mapping
API->>API: Group entities by type
API->>F: JSON response with entities
F->>U: Display highlighted entities
Note over M,HF: Model loaded from<br/>IsmatS/xlm-roberta-az-ner
Project Structure
.
βββ Dockerfile # Docker image configuration
βββ README.md # Project documentation
βββ fly.toml # Fly.io deployment configuration
βββ main.py # FastAPI application entry point
βββ models/ # Model-related files
β βββ NER_from_scratch.ipynb # Custom NER implementation notebook
β βββ README.md # Models documentation
β βββ XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook
β βββ azeri-turkish-bert-ner.ipynb # Azeri-Turkish BERT training
β βββ mBERT.ipynb # mBERT training notebook
β βββ push_to_HF.py # Hugging Face upload script
β βββ train-00000-of-00001.parquet # Training data
β βββ xlm_roberta_large.ipynb # XLM-RoBERTa Large training
βββ requirements.txt # Python dependencies
βββ static/ # Frontend assets
β βββ app.js # Frontend logic
β βββ style.css # UI styling
βββ templates/ # HTML templates
βββ index.html # Main UI template
π§ Models & Dataset
π Available Models
| Model | Parameters | F1-Score | Hugging Face | Status |
|---|---|---|---|---|
| mBERT Azerbaijani NER | 180M | 67.70% | β | Released |
| XLM-RoBERTa Azerbaijani NER | 125M | 75.22% | β | Production |
| XLM-RoBERTa Large Azerbaijani NER | 355M | 75.48% | β | Released |
| Azerbaijani-Turkish BERT Base NER | 110M | 73.55% | β | Released |
π Supported Entity Types (25 Categories)
| Category | Category | Category |
|---|---|---|
| Person | Government | Law |
| Location | Date | Language |
| Organization | Time | Position |
| Facility | Money | Nationality |
| Product | Percentage | Disease |
| Event | Contact | Quantity |
| Art | Project | Cardinal |
| Proverb | Ordinal | Miscellaneous |
| Other |
π Dataset Information
- Source: Azerbaijani NER Dataset
- Size: High-quality annotated Azerbaijani text corpus
- Language: Azerbaijani (az)
- Annotation: IOB2 format with 25 entity categories
- Training Infrastructure: A100 GPU on Google Colab Pro+
π Model Performance Comparison
| Model | F1-Score |
|---|---|
| mBERT | 67.70% |
| XLM-RoBERTa Base | 75.22% |
| XLM-RoBERTa Large | 75.48% |
| Azeri-Turkish-BERT | 73.55% |
π Detailed Performance Metrics
mBERT Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|
| 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 |
| 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 |
| 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 |
XLM-RoBERTa Base Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 |
| 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 |
| 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 |
| 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 |
XLM-RoBERTa Large Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 |
| 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 |
| 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 |
| 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 |
Azeri-Turkish-BERT Performance
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 |
| 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 |
| 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 |
| 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 |
β‘ Key Features
- π― State-of-the-art Accuracy: 75.22% F1-score on Azerbaijani NER
- π 25 Entity Categories: Comprehensive coverage including Person, Location, Organization, Government, and more
- π Production Ready: Deployed on Fly.io with FastAPI backend
- π¨ Interactive UI: Real-time entity highlighting with confidence scores
- π Multiple Models: Four different transformer models to choose from
- π Confidence Scoring: Each prediction includes confidence metrics
- π Multilingual Foundation: Built on XLM-RoBERTa for cross-lingual understanding
- π± Responsive Design: Works seamlessly across desktop and mobile devices
π οΈ Technology Stack
graph LR
subgraph "Frontend"
A[HTML5] --> B[CSS3]
B --> C[JavaScript]
end
subgraph "Backend"
D[FastAPI] --> E[Python 3.8+]
E --> F[Uvicorn]
end
subgraph "ML Stack"
G[Transformers] --> H[PyTorch]
H --> I[Hugging Face]
end
subgraph "Deployment"
J[Docker] --> K[Fly.io]
K --> L[Production]
end
C --> D
F --> G
I --> J
π Setup Instructions
Local Development
- Clone the repository
git clone https://huggingface.co/IsmatS/Named_Entity_Recognition
cd Named_Entity_Recognition
- Set up Python environment
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
- Run the application
uvicorn main:app --host 0.0.0.0 --port 8080
Fly.io Deployment
- Install Fly CLI
# On Unix/macOS
curl -L https://fly.io/install.sh | sh
- Configure deployment
# Login to Fly.io
fly auth login
# Initialize app
fly launch
# Configure memory (minimum 2GB recommended)
fly scale memory 2048
- Deploy application
fly deploy
# Monitor deployment
fly logs
π‘ Usage
Quick Start
Access the application:
- π Local: http://localhost:8080
- π Production: https://named-entity-recognition.fly.dev
Enter Azerbaijani text in the input field
Click "Submit" to process and view named entities
View results with entities highlighted by category and confidence scores
Example Usage
# Example API request
import requests
response = requests.post(
"https://named-entity-recognition.fly.dev/predict/",
data={"text": "2014-cΓΌ ildΙ AzΙrbaycan RespublikasΔ±nΔ±n prezidenti Δ°lham Ζliyev Salyanda olub."}
)
print(response.json())
# Output: {
# "entities": {
# "Date": ["2014"],
# "Government": ["AzΙrbaycan"],
# "Organization": ["RespublikasΔ±nΔ±n"],
# "Position": ["prezidenti"],
# "Person": ["Δ°lham Ζliyev"],
# "Location": ["Salyanda"]
# }
# }
π― Model Capabilities
- Person Names: Δ°lham Ζliyev, HeydΙr Ζliyev, Nizami GΙncΙvi
- Locations: BakΔ±, Salyanda, AzΙrbaycan, GΙncΙ
- Organizations: Respublika, Universitet, ΕirkΙt
- Dates & Times: 2014-cΓΌ il, sentyabr ayΔ±, sΙhΙr saatlarΔ±
- Government Entities: prezident, nazir, mΙclis
- And 20+ more categories...
π€ Contributing
We welcome contributions! Here's how you can help:
- π΄ Fork the repository
- πΏ Create your feature branch (
git checkout -b feature/AmazingFeature) - π Commit your changes (
git commit -m 'Add some AmazingFeature') - π€ Push to the branch (
git push origin feature/AmazingFeature) - π Open a Pull Request
Development Areas
- π§ Model improvements and fine-tuning
- π¨ UI/UX enhancements
- π Performance optimizations
- π§ͺ Additional test cases
- π Documentation improvements
π License
This project is open source and available under the MIT License.
π Acknowledgments
- Hugging Face team for the transformer models and infrastructure
- Google Colab for providing A100 GPU access
- Fly.io for hosting the production deployment
- The Azerbaijani NLP community for dataset contributions