bee / README.md
Bee Deploy
HF Space backend deploy [de0cba5]
5e21013
metadata
title: Bee Intelligence Engine
emoji: 🐝
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
short_description: The Intelligence Engine β€” domain LoRA adapters

Bee β€” The Intelligence Engine

Trust-critical AI for regulated and mission-critical systems. Built by CUI Labs on the XIIS platform.

Last verified: 2026-05-05.


What's actually running today

Surface State Source-of-truth
Bee Cell inference (production) Live on Modal serverless (bee-cell-prod) β€” replaces the legacy HF Space cuilabs-bee.hf.space. Frontend talks to it via BEE_API_URL env on Vercel. infra/modal/bee_app.py
Web app bee.cuilabs.io on Vercel apps/web
Mobile app React Native CLI 0.85.2 (no Expo, no EAS) β€” Stage 0 release scaffolding. Backend pointer in Settings. apps/mobile/README.md
Desktop app Tauri 2.10 shell pointing at bee.cuilabs.io. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. apps/desktop/README.md
Bee Security Eval Harness 52 cases / 10 categories. Latest baseline on Bee Cell base: 12.5 / 100 (gates Stage 1 APK). eval/bee_security_harness/README.md
Stage 0 safety wrapper Runtime preamble + refusal substrate around every chat completion. bee/safety_wrapper.py
Cybersec adapter training Stage 0.5 Comb run on Vertex AI L4 (one-time exception β€” Comb usually rides Kaggle). workers/vertex-train/README.md
Cell + Cell+ training Kaggle T4Γ—2 GPU pool, push-only dispatcher (commit 3edb643). workers/kaggle-online-train/README.md
Cron pipeline 15 Vercel cron routes β€” kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. apps/web/src/app/api/cron/

Benchmarks

Reproducible eval on the base model (no LoRA adapter applied). Run via python -m bee.eval_harness β€” every task and pass criterion is in bee/eval_harness.py, every output is captured in data/eval_reports/*.json.

  Model:    HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params)
  Device:   MPS (Apple Silicon, fp16)
  Date:     2026-04-29
  Wall:     25.9s for all 5 benchmarks
  ─────────────────────────────────────────────────────
  coding         100%  (10/10)   avg latency  2033 ms
  reasoning       40%  (4/10)    avg latency   146 ms
  instruct        50%  (5/10)    avg latency   167 ms
  grounded        80%  (4/5)     avg latency   116 ms
  domain         100%  (5/5)     avg latency   381 ms
  ─────────────────────────────────────────────────────
  OVERALL         74%

How to read these numbers:

  • coding 100% is a shape check (function name + return keyword present), not a correctness test. A real correctness benchmark would score lower.
  • reasoning 40% and instruct 50% are honest signal β€” at 360M base, multi-step math and exact-format compliance are hard.
  • A few instruct / grounded failures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in data/eval_reports/2026-04-29_smollm2-360m_mps.json so you can audit.

Reproduce locally:

python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \
  --output data/eval_reports/my_run.json

Per-domain LoRA adapters at cuilabs/bee-cell are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them.

Bee Security Eval Harness β€” first real baseline

Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is eval/bee_security_harness/cases/*.yaml (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims).

  Surface:   Bee Cell base (no cybersec adapter applied)
  Backend:   Modal bee-cell-prod
  Date:      2026-05-03
  Score:     12.5 / 100   (release gate is >= 80 with zero blocking failures)

12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: apps/web/src/app/api/cron/eval-run/route.ts, summary table eval_runs, per-case results eval_run_results.


Quick Start

# 1. Create environment
python3 -m venv .venv
source .venv/bin/activate
pip install torch transformers accelerate peft datasets trl \
  sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \
  python-dotenv qiskit sentence-transformers faiss-cpu websockets

# 2. Copy environment config
cp .env.example .env
# Edit .env with your API keys (optional β€” Bee works without them)

# 3. Run the eval harness (verifies install + reproduces the numbers above)
python -m bee.eval_harness --device mps

# 4. Start the server
python -m bee.server

# 5. Start the full daemon (server + evolution + distillation)
python -m bee

API (OpenAI-compatible)

# Chat
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'

# Health
curl http://localhost:8000/health

# Router stats
curl http://localhost:8000/v1/router/stats

# Switch domain
curl -X POST http://localhost:8000/v1/domain/switch \
  -H "Content-Type: application/json" \
  -d '{"domain":"cybersecurity"}'

Tier-1 domains (10): general, programming, ai, cybersecurity, quantum, fintech, blockchain, infrastructure, research, business. Source: bee/domains.py.


Architecture

bee/
  server.py            FastAPI server, OpenAI-compatible API, adaptive routing
  safety_wrapper.py    Stage 0 runtime safety preamble + refusal substrate
  adaptive_router.py   Difficulty estimation, self-verification, context memory
  distillation.py      Teacher-student distillation (Claude/GPT-4 -> Bee)
  evolution.py         Autonomous algorithm evolution
  invention_engine.py  Invents novel attention, compression, SSM modules
  self_coding.py       Code generation + sandboxed execution
  self_heal.py         Training health monitoring, auto-recovery
  community.py         Share inventions between Bee instances (HuggingFace Hub)
  quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim)
  quantum_ibm.py       IBM Quantum Platform integration (156-qubit Heron r2)
  quantum_sim.py       Local quantum statevector simulation
  retrieval.py         RAG pipeline (FAISS + sentence-transformers)
  lora_adapter.py      Domain LoRA adapter management
  nn_compression.py    VQ-VAE hierarchical neural compression
  memory.py            Hierarchical compressive memory
  moe.py               Sparse mixture of experts
  state_space.py       Selective state space model
  daemon.py            Autonomous daemon (background evolution, distillation)
  ignition.py          Full BeeAGI architecture activation (research-only,
                       BEE_IGNITE=0 in production)
  benchmark.py         10-test benchmark suite
  eval_harness.py      General-capability harness (the SmolLM2 numbers above)
  config.py            Model configuration
  modeling_bee.py      Custom BeeForCausalLM

apps/web/              Next.js customer web app deployed to Vercel
apps/mobile/           React Native CLI 0.85.2 native iOS+Android
apps/desktop/          Tauri 2.10 native shell (macOS/Windows/Linux)
sdks/python/           Official Python client (bee-sdk)

eval/bee_security_harness/
                       52-case security gate (10 categories, regex grader DSL)

infra/modal/           Production inference deployment (bee-cell-prod)
infra/hf-space/        Deprecated; retained for community model-card hosting
infra/db/              Postgres migrations (eval_runs, training_runs, etc.)
infra/supabase/        Supabase project config

workers/
  kaggle-online-train/ T4Γ—2 GPU runner β€” cell, cell+, comb (when forced)
  kaggle-tpu-train/    TPU v6e-8 runner β€” every-step debug logging
  vertex-train/        L4 / A100 β€” reserved for tiers Kaggle can't host
                       (Hive, Swarm, Enclave, Ignite)
  colab-online-train/  Manual paste-test workflow on Colab T4
  lightning-train/     Inactive β€” manual launcher, not wired to a cron

packages/              auth, billing, core, db, email, pqc, qnsp-client,
                       rag, telemetry, training, ui β€” TypeScript workspace
scripts/               Distillation, deploys, dataset prep, ops
docs/                  Architecture, API reference, runbooks

Repository Layout

The approved source of truth for the monorepo layout lives in docs/architecture/repository.md.

Current migration truth:

  • apps/web is the canonical frontend path.
  • apps/mobile is the canonical mobile app path (React Native CLI, no Expo).
  • apps/desktop is the canonical desktop app path (Tauri 2.10).
  • bee/ remains rooted at the repository top level and is the canonical backend package.
  • infra/modal/bee_app.py is the production inference entrypoint. The root Dockerfile is retained for parity with the historical HF Space image and for ad-hoc Docker runs.

Deployment Topology

  • GitHub hosts the monorepo source of truth.
  • Vercel serves the web app from apps/web at https://bee.cuilabs.io.
  • Namecheap manages DNS for bee.cuilabs.io and (eventually) api.bee.cuilabs.io.
  • Modal serves the backend inference API as bee-cell-prod. The frontend points at it via the BEE_API_URL env on Vercel; default URL pattern is https://cuilabs--bee-cell-prod-fastapi-app.modal.run (infra/modal/bee_app.py).
  • The legacy Hugging Face Space (cuilabs-bee.hf.space) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only (infra/hf-space/README.md).
  • Large datasets, checkpoints, and adapters live on Hugging Face Hub (cuilabs/bee-cell, cuilabs/bee-cell-plus, cuilabs/bee-comb, cuilabs/bee-interactions), not in the frontend deployment payload.

How It Works

  1. Adaptive Router β€” Routes easy queries locally (free), hard queries to teacher API
  2. Self-Verification β€” Scores every output, re-generates if quality is low
  3. Context Memory β€” Compresses past conversations for infinite memory
  4. Teacher Distillation β€” Uses Claude/GPT-4 to generate expert training data
  5. LoRA Training β€” Domain-specific adapters trained on free Colab/Kaggle GPUs
  6. Evolution β€” Autonomously invents better algorithms
  7. Community β€” Shares validated inventions between all Bee instances
  8. Quantum β€” IBM Quantum hardware or local simulation for decision optimization

Design goal, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by /v1/router/stats β€” that endpoint is the source of truth, not this README.

Hardware

Tier Base model Params RAM (fp16) Throughput
cell (default) SmolLM2-360M-Instruct 361.8M ~0.7 GB 89 tok/s on Apple Silicon MPS (fp16, greedy)
cell-plus, comb, comb-team, hive see bee/tiers.py 1.7B–32B scales with tier not yet benchmarked locally

The 89 tok/s number is from data/eval_reports/2026-04-29_throughput_mps.json β€” 5 prompts Γ— ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates.

Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers (infra/modal/bee_app.py) with a persistent bee-cache volume so cold starts don't re-pull SmolLM2-360M.

Environment Variables

See .env.example for all options. Key ones:

BEE_DEVICE=mps                    # auto, mps, cuda, cpu
BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct
BEE_TEACHER_API_KEY=              # Anthropic or OpenAI key (optional)
IBM_QUANTUM_API_KEY=              # IBM Quantum (optional)
BEE_API_URL=                      # Set on Vercel + mobile + SDK to point
                                  # at the Modal production backend.
                                  # Default in code is the legacy HF Space
                                  # for backward-compat only.
BEE_IGNITE=0                      # Keep 0 for production. The Ignite
                                  # research-AGI substrate is gated by
                                  # this flag; see bee/ignition.py.

License

MIT