title: Bee Intelligence Engine
emoji: π
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
short_description: The Intelligence Engine β domain LoRA adapters
Bee β The Intelligence Engine
Trust-critical AI for regulated and mission-critical systems. Built by CUI Labs on the XIIS platform.
Last verified: 2026-05-05.
What's actually running today
| Surface | State | Source-of-truth |
|---|---|---|
| Bee Cell inference (production) | Live on Modal serverless (bee-cell-prod) β replaces the legacy HF Space cuilabs-bee.hf.space. Frontend talks to it via BEE_API_URL env on Vercel. |
infra/modal/bee_app.py |
| Web app | bee.cuilabs.io on Vercel |
apps/web |
| Mobile app | React Native CLI 0.85.2 (no Expo, no EAS) β Stage 0 release scaffolding. Backend pointer in Settings. | apps/mobile/README.md |
| Desktop app | Tauri 2.10 shell pointing at bee.cuilabs.io. Source scaffold landed 2026-04-30; signed releases gated on cert/Apple-Dev enrollment. |
apps/desktop/README.md |
| Bee Security Eval Harness | 52 cases / 10 categories. Latest baseline on Bee Cell base: 12.5 / 100 (gates Stage 1 APK). | eval/bee_security_harness/README.md |
| Stage 0 safety wrapper | Runtime preamble + refusal substrate around every chat completion. | bee/safety_wrapper.py |
| Cybersec adapter training | Stage 0.5 Comb run on Vertex AI L4 (one-time exception β Comb usually rides Kaggle). | workers/vertex-train/README.md |
| Cell + Cell+ training | Kaggle T4Γ2 GPU pool, push-only dispatcher (commit 3edb643). |
workers/kaggle-online-train/README.md |
| Cron pipeline | 15 Vercel cron routes β kaggle-dispatch, kaggle-tpu-dispatch, eval-run, cve-ingest, kev-ingest, distillation, online-training, evolution-cycle, community-pull, github-trending, hf-dispatch, heartbeat, memory-extract, interactions-export, research-correct. | apps/web/src/app/api/cron/ |
Benchmarks
Reproducible eval on the base model (no LoRA adapter applied). Run via python -m bee.eval_harness β every task and pass criterion is in bee/eval_harness.py, every output is captured in data/eval_reports/*.json.
Model: HuggingFaceTB/SmolLM2-360M-Instruct (361.8M params)
Device: MPS (Apple Silicon, fp16)
Date: 2026-04-29
Wall: 25.9s for all 5 benchmarks
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
coding 100% (10/10) avg latency 2033 ms
reasoning 40% (4/10) avg latency 146 ms
instruct 50% (5/10) avg latency 167 ms
grounded 80% (4/5) avg latency 116 ms
domain 100% (5/5) avg latency 381 ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
OVERALL 74%
How to read these numbers:
coding 100%is a shape check (function name +returnkeyword present), not a correctness test. A real correctness benchmark would score lower.reasoning 40%andinstruct 50%are honest signal β at 360M base, multi-step math and exact-format compliance are hard.- A few
instruct/groundedfailures are pattern-match strictness in the harness (e.g. answer is right but contains an extra word). The raw output for every task is in data/eval_reports/2026-04-29_smollm2-360m_mps.json so you can audit.
Reproduce locally:
python -m bee.eval_harness --model HuggingFaceTB/SmolLM2-360M-Instruct --device mps \
--output data/eval_reports/my_run.json
Per-domain LoRA adapters at cuilabs/bee-cell are evaluated separately on domain-specific tasks; numbers land in this README only after a training run produces them.
Bee Security Eval Harness β first real baseline
Bee's security capability is measured against an in-house gate, not a generic benchmark. Source-of-truth for the cases is eval/bee_security_harness/cases/*.yaml (52 cases across 10 categories: insecure-code generation, prompt injection, agent tool abuse, tenant isolation, authz/authn failures, cloud IAM, dependency CVEs, secret leakage, unsafe cyber responses, hallucinated security claims).
Surface: Bee Cell base (no cybersec adapter applied)
Backend: Modal bee-cell-prod
Date: 2026-05-03
Score: 12.5 / 100 (release gate is >= 80 with zero blocking failures)
12.5 is the honest pre-adapter floor and is the reason Stage 0.5 cybersec adapter training is currently running on Vertex L4. The Stage 1 APK release is gated on a re-run of this harness against the post-adapter Modal endpoint. Run logic and case-loader: apps/web/src/app/api/cron/eval-run/route.ts, summary table eval_runs, per-case results eval_run_results.
Quick Start
# 1. Create environment
python3 -m venv .venv
source .venv/bin/activate
pip install torch transformers accelerate peft datasets trl \
sentencepiece protobuf numpy fastapi uvicorn pydantic httpx \
python-dotenv qiskit sentence-transformers faiss-cpu websockets
# 2. Copy environment config
cp .env.example .env
# Edit .env with your API keys (optional β Bee works without them)
# 3. Run the eval harness (verifies install + reproduces the numbers above)
python -m bee.eval_harness --device mps
# 4. Start the server
python -m bee.server
# 5. Start the full daemon (server + evolution + distillation)
python -m bee
API (OpenAI-compatible)
# Chat
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'
# Health
curl http://localhost:8000/health
# Router stats
curl http://localhost:8000/v1/router/stats
# Switch domain
curl -X POST http://localhost:8000/v1/domain/switch \
-H "Content-Type: application/json" \
-d '{"domain":"cybersecurity"}'
Tier-1 domains (10): general, programming, ai, cybersecurity, quantum, fintech, blockchain, infrastructure, research, business. Source: bee/domains.py.
Architecture
bee/
server.py FastAPI server, OpenAI-compatible API, adaptive routing
safety_wrapper.py Stage 0 runtime safety preamble + refusal substrate
adaptive_router.py Difficulty estimation, self-verification, context memory
distillation.py Teacher-student distillation (Claude/GPT-4 -> Bee)
evolution.py Autonomous algorithm evolution
invention_engine.py Invents novel attention, compression, SSM modules
self_coding.py Code generation + sandboxed execution
self_heal.py Training health monitoring, auto-recovery
community.py Share inventions between Bee instances (HuggingFace Hub)
quantum_reasoning.py Quantum-enhanced decision making (IBM Quantum / local sim)
quantum_ibm.py IBM Quantum Platform integration (156-qubit Heron r2)
quantum_sim.py Local quantum statevector simulation
retrieval.py RAG pipeline (FAISS + sentence-transformers)
lora_adapter.py Domain LoRA adapter management
nn_compression.py VQ-VAE hierarchical neural compression
memory.py Hierarchical compressive memory
moe.py Sparse mixture of experts
state_space.py Selective state space model
daemon.py Autonomous daemon (background evolution, distillation)
ignition.py Full BeeAGI architecture activation (research-only,
BEE_IGNITE=0 in production)
benchmark.py 10-test benchmark suite
eval_harness.py General-capability harness (the SmolLM2 numbers above)
config.py Model configuration
modeling_bee.py Custom BeeForCausalLM
apps/web/ Next.js customer web app deployed to Vercel
apps/mobile/ React Native CLI 0.85.2 native iOS+Android
apps/desktop/ Tauri 2.10 native shell (macOS/Windows/Linux)
sdks/python/ Official Python client (bee-sdk)
eval/bee_security_harness/
52-case security gate (10 categories, regex grader DSL)
infra/modal/ Production inference deployment (bee-cell-prod)
infra/hf-space/ Deprecated; retained for community model-card hosting
infra/db/ Postgres migrations (eval_runs, training_runs, etc.)
infra/supabase/ Supabase project config
workers/
kaggle-online-train/ T4Γ2 GPU runner β cell, cell+, comb (when forced)
kaggle-tpu-train/ TPU v6e-8 runner β every-step debug logging
vertex-train/ L4 / A100 β reserved for tiers Kaggle can't host
(Hive, Swarm, Enclave, Ignite)
colab-online-train/ Manual paste-test workflow on Colab T4
lightning-train/ Inactive β manual launcher, not wired to a cron
packages/ auth, billing, core, db, email, pqc, qnsp-client,
rag, telemetry, training, ui β TypeScript workspace
scripts/ Distillation, deploys, dataset prep, ops
docs/ Architecture, API reference, runbooks
Repository Layout
The approved source of truth for the monorepo layout lives in docs/architecture/repository.md.
Current migration truth:
apps/webis the canonical frontend path.apps/mobileis the canonical mobile app path (React Native CLI, no Expo).apps/desktopis the canonical desktop app path (Tauri 2.10).bee/remains rooted at the repository top level and is the canonical backend package.infra/modal/bee_app.pyis the production inference entrypoint. The rootDockerfileis retained for parity with the historical HF Space image and for ad-hoc Docker runs.
Deployment Topology
- GitHub hosts the monorepo source of truth.
- Vercel serves the web app from
apps/webathttps://bee.cuilabs.io. - Namecheap manages DNS for
bee.cuilabs.ioand (eventually)api.bee.cuilabs.io. - Modal serves the backend inference API as
bee-cell-prod. The frontend points at it via theBEE_API_URLenv on Vercel; default URL pattern ishttps://cuilabs--bee-cell-prod-fastapi-app.modal.run(infra/modal/bee_app.py). - The legacy Hugging Face Space (
cuilabs-bee.hf.space) is deprecated. It is no longer the production backend; HF org artifacts are retained for community model-card and dataset hosting only (infra/hf-space/README.md). - Large datasets, checkpoints, and adapters live on Hugging Face Hub (
cuilabs/bee-cell,cuilabs/bee-cell-plus,cuilabs/bee-comb,cuilabs/bee-interactions), not in the frontend deployment payload.
How It Works
- Adaptive Router β Routes easy queries locally (free), hard queries to teacher API
- Self-Verification β Scores every output, re-generates if quality is low
- Context Memory β Compresses past conversations for infinite memory
- Teacher Distillation β Uses Claude/GPT-4 to generate expert training data
- LoRA Training β Domain-specific adapters trained on free Colab/Kaggle GPUs
- Evolution β Autonomously invents better algorithms
- Community β Shares validated inventions between all Bee instances
- Quantum β IBM Quantum hardware or local simulation for decision optimization
Design goal, not a measured steady-state: route easy queries locally (free), expensive ones to a teacher model, capture every teacher response as training data, and shrink the teacher-call ratio over time as Bee's domain adapters improve. Actual local-vs-teacher split and cost-per-query are emitted live by /v1/router/stats β that endpoint is the source of truth, not this README.
Hardware
| Tier | Base model | Params | RAM (fp16) | Throughput |
|---|---|---|---|---|
cell (default) |
SmolLM2-360M-Instruct | 361.8M | ~0.7 GB | 89 tok/s on Apple Silicon MPS (fp16, greedy) |
cell-plus, comb, comb-team, hive |
see bee/tiers.py | 1.7Bβ32B | scales with tier | not yet benchmarked locally |
The 89 tok/s number is from data/eval_reports/2026-04-29_throughput_mps.json β 5 prompts Γ ~100 tokens each, measured today. Larger tiers' throughput numbers will land in this table once a real measurement is taken on the target hardware; we don't quote estimates.
Runs on: macOS (MPS), Linux (CUDA), any CPU (slow). Production traffic is served by Modal's L4-class containers (infra/modal/bee_app.py) with a persistent bee-cache volume so cold starts don't re-pull SmolLM2-360M.
Environment Variables
See .env.example for all options. Key ones:
BEE_DEVICE=mps # auto, mps, cuda, cpu
BEE_MODEL_PATH=HuggingFaceTB/SmolLM2-360M-Instruct
BEE_TEACHER_API_KEY= # Anthropic or OpenAI key (optional)
IBM_QUANTUM_API_KEY= # IBM Quantum (optional)
BEE_API_URL= # Set on Vercel + mobile + SDK to point
# at the Modal production backend.
# Default in code is the legacy HF Space
# for backward-compat only.
BEE_IGNITE=0 # Keep 0 for production. The Ignite
# research-AGI substrate is gated by
# this flag; see bee/ignition.py.
License
MIT