metadata
title: Hopcroft Skill Classification
emoji: π§
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs
Hopcroft Skill Classification
Multi-label skill classification for GitHub issues and pull requests β Automatically identify technical skills required to resolve software issues using machine learning.
Overview
Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.
Key Features
- π― Multi-label Classification: Predict multiple skills per issue
- π REST API: FastAPI with Swagger documentation
- π₯οΈ Web Interface: Streamlit GUI for interactive predictions
- π Monitoring: Prometheus/Grafana dashboards with drift detection
- π CI/CD: GitHub Actions with Docker deployment
- π Experiment Tracking: MLflow on DagsHub
Architecture
graph TB
subgraph "Data Layer"
A[(SkillScope DB)] --> B[Feature Engineering]
B --> C[TF-IDF / Embeddings]
end
subgraph "ML Pipeline"
C --> D[Model Training]
D --> E[(MLflow Tracking)]
D --> F[Random Forest Model]
end
subgraph "Serving Layer"
F --> G[FastAPI Service]
G --> H[predict endpoint]
G --> I[predictions endpoint]
G --> J[health endpoint]
end
subgraph "Frontend"
G --> K[Streamlit GUI]
end
subgraph "Monitoring"
G --> L[Prometheus]
L --> M[Grafana]
N[Drift Detection] --> L
end
subgraph "Deployment"
O[GitHub Actions] --> P[Docker Build]
P --> Q[HF Spaces]
end
Documentation
| Document | Description |
|---|---|
| π Milestone Summaries | All 6 project phases documented |
| π User Guide | Setup, API, GUI, testing, monitoring |
| ποΈ Design Choices | Technical decisions & rationale |
| π― ML Canvas | Requirements engineering framework |
| β Testing & Validation | QA strategy & results |
| π Model Card | Model details & performance |
| π Dataset Card | Dataset details & preprocessing |
Quick Start
Docker (Recommended)
# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials
# Start services
docker compose -f docker/docker-compose.yml up -d --build
Access (Local):
- π API Docs: http://localhost:8080/docs
- π₯οΈ GUI: http://localhost:8501
- β€οΈ Health: http://localhost:8080/health
Local Development
# Setup environment
python -m venv venv && source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .
# Start API
make api-dev
# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py
Project Structure
βββ hopcroft_skill_classification_tool_competition/
β βββ main.py # FastAPI application
β βββ streamlit_app.py # Streamlit GUI
β βββ features.py # Feature engineering
β βββ modeling/ # Training & prediction
β βββ config.py # Configuration
βββ data/ # DVC-tracked datasets
βββ models/ # DVC-tracked models
βββ tests/ # Pytest test suites
βββ monitoring/ # Prometheus, Grafana, Locust
βββ docker/ # Docker configurations
βββ docs/ # Documentation
βββ .github/workflows/ # CI/CD pipelines
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/predict |
Classify single issue |
POST |
/predict/batch |
Batch classification |
GET |
/predictions |
List recent predictions |
GET |
/predictions/{id} |
Get by MLflow run ID |
GET |
/health |
Health check |
GET |
/metrics |
Prometheus metrics |
Example:
curl -X POST "http://localhost:8080/predict" \
-H "Content-Type: application/json" \
-d '{"issue_text": "Fix OAuth2 authentication bug"}'
Live Deployment
- API: https://dacrow13-hopcroft-skill-classification.hf.space/docs
- GUI: https://dacrow13-hopcroft-skill-classification.hf.space
- MLflow: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
- Prometheus: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
- Grafana: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
- Betterstack: Alerting configured. Alert System Evidence
Development
# Run tests
make test-all # All tests
make test-behavioral # ML behavioral tests
make validate-deepchecks # Data validation
# Lint & format
make lint # Check code style
make format # Auto-fix issues
# Training
make train-baseline-tfidf # Train baseline model
License
This project was developed as part of the SE4AI 2025-26 course at the University of Bari.