Spaces:

DaCrow13
/

Hopcroft-Skill-Classification

Running

App Files Files Community

Hopcroft-Skill-Classification / README.md

maurocarlu

adding Production links to the root Readme

fab0e43 19 days ago

preview code

raw

history blame contribute delete

5.95 kB

metadata

title: Hopcroft Skill Classification
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
api_docs_url: /docs

Hopcroft Skill Classification

Multi-label skill classification for GitHub issues and pull requests — Automatically identify technical skills required to resolve software issues using machine learning.

Overview

Hopcroft is an ML-enabled system that classifies GitHub issues into 217 technical skill categories, enabling automated developer assignment and optimized resource allocation. Built following professional MLOps and Software Engineering standards.

Key Features

🎯 Multi-label Classification: Predict multiple skills per issue
🚀 REST API: FastAPI with Swagger documentation
🖥️ Web Interface: Streamlit GUI for interactive predictions
📊 Monitoring: Prometheus/Grafana dashboards with drift detection
🔄 CI/CD: GitHub Actions with Docker deployment
📈 Experiment Tracking: MLflow on DagsHub

Architecture

graph TB
    subgraph "Data Layer"
        A[(SkillScope DB)] --> B[Feature Engineering]
        B --> C[TF-IDF / Embeddings]
    end
    
    subgraph "ML Pipeline"
        C --> D[Model Training]
        D --> E[(MLflow Tracking)]
        D --> F[Random Forest Model]
    end
    
    subgraph "Serving Layer"
        F --> G[FastAPI Service]
        G --> H[predict endpoint]
        G --> I[predictions endpoint]
        G --> J[health endpoint]
    end
    
    subgraph "Frontend"
        G --> K[Streamlit GUI]
    end
    
    subgraph "Monitoring"
        G --> L[Prometheus]
        L --> M[Grafana]
        N[Drift Detection] --> L
    end
    
    subgraph "Deployment"
        O[GitHub Actions] --> P[Docker Build]
        P --> Q[HF Spaces]
    end

Documentation

Document	Description
📋 Milestone Summaries	All 6 project phases documented
📖 User Guide	Setup, API, GUI, testing, monitoring
🏗️ Design Choices	Technical decisions & rationale
🎯 ML Canvas	Requirements engineering framework
✅ Testing & Validation	QA strategy & results
📊 Model Card	Model details & performance
📊 Dataset Card	Dataset details & preprocessing

Quick Start

Docker (Recommended)

# Clone and configure
git clone https://github.com/se4ai2526-uniba/Hopcroft.git
cd Hopcroft
cp .env.example .env
# Edit .env with your DagsHub credentials

# Start services
docker compose -f docker/docker-compose.yml up -d --build

Access (Local):

🌐 API Docs: http://localhost:8080/docs
🖥️ GUI: http://localhost:8501
❤️ Health: http://localhost:8080/health

Local Development

# Setup environment
python -m venv venv && source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt && pip install -e .

# Start API
make api-dev

# Start GUI (new terminal)
streamlit run hopcroft_skill_classification_tool_competition/streamlit_app.py

Project Structure

├── hopcroft_skill_classification_tool_competition/
│   ├── main.py              # FastAPI application
│   ├── streamlit_app.py     # Streamlit GUI
│   ├── features.py          # Feature engineering
│   ├── modeling/            # Training & prediction
│   └── config.py            # Configuration
├── data/                    # DVC-tracked datasets
├── models/                  # DVC-tracked models
├── tests/                   # Pytest test suites
├── monitoring/              # Prometheus, Grafana, Locust
├── docker/                  # Docker configurations
├── docs/                    # Documentation
└── .github/workflows/       # CI/CD pipelines

API Endpoints

Method	Endpoint	Description
`POST`	`/predict`	Classify single issue
`POST`	`/predict/batch`	Batch classification
`GET`	`/predictions`	List recent predictions
`GET`	`/predictions/{id}`	Get by MLflow run ID
`GET`	`/health`	Health check
`GET`	`/metrics`	Prometheus metrics

Example:

curl -X POST "http://localhost:8080/predict" \
  -H "Content-Type: application/json" \
  -d '{"issue_text": "Fix OAuth2 authentication bug"}'

Live Deployment

API: https://dacrow13-hopcroft-skill-classification.hf.space/docs
GUI: https://dacrow13-hopcroft-skill-classification.hf.space
MLflow: https://dagshub.com/se4ai2526-uniba/Hopcroft/experiments
Prometheus: https://dacrow13-hopcroft-skill-classification.hf.space/prometheus/
Grafana: https://dacrow13-hopcroft-skill-classification.hf.space/grafana/
Betterstack: Alerting configured. Alert System Evidence

Development

# Run tests
make test-all              # All tests
make test-behavioral       # ML behavioral tests
make validate-deepchecks   # Data validation

# Lint & format
make lint                  # Check code style
make format                # Auto-fix issues

# Training
make train-baseline-tfidf  # Train baseline model

License

This project was developed as part of the SE4AI 2025-26 course at the University of Bari.