Researcher / README.md
amarck's picture
Add HF Spaces support, preference seeding, archive search, tests
430d0f8
metadata
title: Research Intelligence
emoji: πŸ“‘
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860

Research Intelligence

A self-hosted research triage system that monitors academic papers (AI/ML and Security) and trending GitHub projects, scores them with AI, and learns your preferences over time.

This HuggingFace Space is a demo. Data is ephemeral and resets when the container restarts. For production use, deploy locally with Docker Compose and persistent storage β€” see instructions below.

Features

  • Paper monitoring β€” Fetches new papers from arXiv and HuggingFace daily/weekly
  • AI scoring β€” Scores each paper on configurable axes (novelty, code availability, practical impact)
  • Preference learning β€” Rate papers with thumbs up/down; the system learns what you care about and re-ranks accordingly
  • GitHub tracking β€” Monitors trending repositories across curated collections
  • Event tracking β€” Conference deadlines, releases, and RSS news feeds
  • Weekly reports β€” Auto-generated markdown summaries of top papers
  • Dark-theme dashboard β€” Fast, responsive web UI built with HTMX

Deployment

Docker Compose (recommended for production)

This is the intended deployment method. Your data persists across restarts via a local volume mount.

git clone https://github.com/yourname/researcher.git
cd researcher
cp .env.example .env
# Edit .env and add your Anthropic API key

docker compose up --build

Visit http://localhost:9090 β€” the setup wizard will guide you through configuration.

Security note: The app has no built-in authentication. Run it on a private network or behind a reverse proxy with auth. Do not expose it to the public internet.

Local (without Docker)

git clone https://github.com/yourname/researcher.git
cd researcher
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your Anthropic API key

python -m uvicorn src.web.app:app --host 0.0.0.0 --port 8888

Visit http://localhost:8888 and follow the setup wizard.

HuggingFace Spaces (demo / preview only)

This repo can run on HuggingFace Spaces as a Docker Space for quick demos, but data is ephemeral β€” the database and config reset on every container restart (free Spaces sleep after 48h of inactivity).

To try it:

  1. Duplicate this Space or create a new Docker Space pointing to this repo
  2. In Settings > Secrets, add ANTHROPIC_API_KEY
  3. The app starts automatically β€” follow the setup wizard

For anything beyond a quick test, use Docker Compose locally with persistent storage.

Setup Wizard

On first launch (before config.yaml exists), you'll be guided through:

  1. API Key β€” Enter your Anthropic API key (validated with a test call)
  2. Domains β€” Enable/disable AI/ML and Security monitoring, adjust scoring weights
  3. GitHub β€” Toggle GitHub project tracking
  4. Schedule β€” Set pipeline frequency (daily, weekly, or manual-only)

After setup, you can optionally pick seed papers to bootstrap your preference profile.

Configuration

All settings live in config.yaml (generated by the setup wizard). You can also edit it directly:

domains:
  aiml:
    enabled: true
    scoring_axes:
      - name: "Code & Weights"
        weight: 0.30
      - name: "Novelty"
        weight: 0.35
      - name: "Practical Applicability"
        weight: 0.35
  security:
    enabled: true
    scoring_axes:
      - name: "Has Code/PoC"
        weight: 0.25
      - name: "Novel Attack Surface"
        weight: 0.40
      - name: "Real-World Impact"
        weight: 0.35

schedule:
  cron: "0 22 * * 0"  # Weekly on Sunday at 22:00 UTC

Architecture

Component Technology
Web server FastAPI + Jinja2 + HTMX
Database SQLite
Scoring Anthropic API
Scheduling APScheduler
Container Docker

Key Files

File Purpose
src/config.py YAML config loader with defaults
src/db.py SQLite schema and queries
src/scoring.py API batch scorer
src/preferences.py Preference learning from user signals
src/pipelines/aiml.py AI/ML paper fetcher (HF + arXiv)
src/pipelines/security.py Security paper fetcher (arXiv cs.CR)
src/pipelines/github.py GitHub trending projects
src/pipelines/events.py Conferences, releases, RSS
src/web/app.py Web routes and middleware
src/scheduler.py Cron-based pipeline scheduler

Running Pipelines Manually

From the dashboard, click the pipeline buttons. Or via API:

curl -X POST http://localhost:9090/run/aiml
curl -X POST http://localhost:9090/run/security
curl -X POST http://localhost:9090/run/github
curl -X POST http://localhost:9090/run/events

Requirements

  • Python 3.12+
  • Anthropic API key (for paper scoring)
  • Optional: GitHub token (for higher API rate limits)

License

MIT