NLP-intelligence / README.md
Nomio4640's picture
reorganized files
3773a26
metadata
title: NLP Intelligence
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false

NLP Intelligence β€” Social Monitoring Web Application

Hexagonal (Ports & Adapters) architecture for Mongolian social media content analysis.

Repository Structure

NLP-intelligence/
β”œβ”€β”€ nlp_core/              # Domain Core β€” NER, sentiment, topic modeling, preprocessing (pure Python)
β”œβ”€β”€ adapters/
β”‚   β”œβ”€β”€ api/               # FastAPI REST adapter (routers, schemas, services)
β”‚   β”œβ”€β”€ ner_mongolian/     # Fine-tuned NER model config/tokenizer (weights on HF Hub)
β”‚   └── sumbee/            # Future Sumbee.mn integration
β”œβ”€β”€ frontend/              # Next.js dashboard & admin panel
β”œβ”€β”€ Data/                  # Training data & reference datasets (NOT used at runtime)
β”‚   β”œβ”€β”€ data/              # CoNLL-format training/validation/test files (v1 pipeline)
β”‚   β”œβ”€β”€ datav2/            # JSONL character-offset training data + scripts (v2 pipeline)
β”‚   └── NER-dataset/       # Reference data (locations.json, abbreviations, names)
β”œβ”€β”€ eval/                  # Model evaluation scripts
β”œβ”€β”€ Dockerfile             # Multi-stage production build
β”œβ”€β”€ nginx.conf             # Reverse proxy config (port 7860)
β”œβ”€β”€ start.sh               # Docker entrypoint
└── requirements.txt

Production code: nlp_core/, adapters/api/, frontend/ β€” included in Docker image. ML development: Data/, eval/ β€” excluded from Docker. See Data/README.md for details.

Model

The NER model is hosted on HuggingFace Hub: Nomio4640/ner-mongolian. It is downloaded automatically during Docker build and at runtime (if not cached locally). Model weights are NOT stored in git.

To version a new model after training:

git tag model-v1.0 -m "F1: 0.XX, trained on train_final.conll"

Quick Start

Local Development

# Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd adapters/api
PYTHONPATH=../../ uvicorn main:app --reload --host 0.0.0.0 --port 8000

API docs: http://localhost:8000/docs

# Frontend
cd frontend
npm install
npm run dev

Dashboard: http://localhost:3000

Docker

docker build -t nlp-intelligence .
docker run -p 7860:7860 nlp-intelligence

App: http://localhost:7860

Usage

  1. Open http://localhost:3000
  2. Upload a CSV file with a text or Text column
  3. View NER, sentiment, and network analysis results
  4. Go to /admin to manage the knowledge base, labels, and stopwords

API Endpoints

Method Path Description
POST /api/upload Upload CSV for analysis
POST /api/analyze Analyze single text
POST /api/analyze/batch Analyze batch of texts
POST /api/network Get network graph data
POST /api/insights Get analysis insights
GET/POST /api/admin/knowledge Knowledge base CRUD
GET/POST /api/admin/labels Custom label mapping
GET/POST/DELETE /api/admin/stopwords Stopword management