ilbert commited on
Commit
fd15971
·
verified ·
1 Parent(s): dc1d32a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
4
+ tags:
5
+ - riscv
6
+ - cross-attention
7
+ - flamingo
8
+ - grounded
9
+ - code-generation
10
+ - rv32i
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # Reflex-Coder7B-RISCV
16
+
17
+ **A frozen `Qwen2.5-Coder-7B-Instruct` wired to a RISC-V CPU through Flamingo-style cross-attention. Emits one 32-bit RV32I instruction per cycle, conditioned on live machine state. No text tokens generated at inference.**
18
+
19
+ This repo contains the **adapter weights only** (~4.2 GB fp32). The frozen backbone is pulled from `Qwen/Qwen2.5-Coder-7B-Instruct` at runtime. Total inference memory: ~18 GB bf16 backbone + 4.2 GB fp32 adapters + activations.
20
+
21
+ ## What it does
22
+
23
+ Given a natural-language prompt (`"say hi"`, `"multiply 7 and 8"`, `"compute 5 factorial"`), Reflex drives a Unicorn-backed RV32I emulator instruction by instruction. Each cycle:
24
+
25
+ 1. Read live CPU state (32 registers, PC, memory windows around PC and SP).
26
+ 2. Encode as 65 K/V tokens.
27
+ 3. Run the frozen backbone forward over the prompt, cross-attn adapters fuse state K/V into hidden states at depths 4, 8, 12, 16, 20, 24.
28
+ 4. Last-token pool → MLP → 32 bit sigmoid heads → one 32-bit RV32I instruction word.
29
+ 5. Write the word at PC in Unicorn, step one cycle, loop.
30
+
31
+ ## Base model
32
+
33
+ [`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) — frozen, bf16, untouched.
34
+
35
+ ## Training
36
+
37
+ - **Corpus**: 80,396 (prompt, program) pairs across 56 RV32I program families (arithmetic, loops, comparisons, memory ops, display writes). Every program verified by running it end-to-end through Unicorn before training.
38
+ - **Flattened cycle pool**: ~1.06M `(state, next_instruction)` pairs. Balanced-subsampled to 173k across families per epoch.
39
+ - **Objective**: per-bit binary cross-entropy over the 32 instruction bits, with `rs2` bits (positions 20–24) weighted 5× to overcome the register/immediate polysemy ceiling.
40
+ - **Optimizer**: standard AdamW, cosine LR schedule `1e-4 → 1e-6` over 20k steps, batch 64.
41
+ - **Hardware**: A100 80GB.
42
+
43
+ ## Results (18-task eval + 15-task sweep)
44
+
45
+ - **13 / 15 on a mixed zero-shot sweep** (see README), including six tasks the model was never trained on: multiply-by-repeated-add, power, abs, min, popcount, say-arbitrary-3-char-strings.
46
+ - **popcount(255) = 8 in 199 correct consecutive RISC-V instructions** — an emergent algorithm derived at inference from the frozen backbone's prior on what "popcount" means.
47
+ - Full eval script: `uv run eval --checkpoint reflex_coder7b.pt`.
48
+
49
+ ## Usage
50
+
51
+ ```python
52
+ import torch
53
+ from reflex.demo import load, run_grounded
54
+
55
+ model, tok, cfg = load("reflex_coder7b.pt", device="cuda")
56
+ cpu, emitted, halted, err = run_grounded(
57
+ model, tok, "multiply 7 and 8", device="cuda", max_cycles=200,
58
+ )
59
+ print(f"halted={halted} mem[0x5000]={cpu.mem_word(0x5000)}")
60
+ # halted=True mem[0x5000]=56
61
+ ```
62
+
63
+ Or, interactively:
64
+
65
+ ```bash
66
+ uv run demo --checkpoint reflex_coder7b.pt
67
+ ```
68
+
69
+ ## Installation
70
+
71
+ ```bash
72
+ git clone https://github.com/ilbertt/reflex
73
+ cd reflex
74
+ uv sync
75
+ # Download this checkpoint into the repo root:
76
+ huggingface-cli download ilbertt/reflex-coder7b-riscv reflex_coder7b.pt --local-dir .
77
+ ```
78
+
79
+ The first time you run inference, HuggingFace will automatically fetch the frozen `Qwen2.5-Coder-7B-Instruct` backbone (~15 GB).
80
+
81
+ ## Limitations
82
+
83
+ - **rs2 precision ceiling.** Per-cycle rs2 accuracy ~0.99; long loops (>50 ops) can emit a single-bit-wrong instruction that crashes the program before it stores its result.
84
+ - **No domain-knowledge transfer.** Reflex only knows the program-shaped phrasings in its training corpus. Prompts like `"if x5 is fever, display SICK"` fail — the adapters were never taught to route the backbone's semantic knowledge of "fever" through.
85
+ - **Display strings degrade past 3 characters.** `say hi`, `say 42`, `say wow` all land cleanly; `say hello` returns `hell`.
86
+ - **Some common phrasings are unreliable.** `add 100 and 200 and store the result` can return `100` instead of `300`. `subtract 10 from 25` sometimes returns `35` (semantic confusion on the word "from").
87
+ - **RV32I base ISA only** — no M (multiply/divide), no Zbb (count/bitmanip), no F (float). The model synthesizes all "higher" operations from base instructions.
88
+
89
+ ## Files
90
+
91
+ - `reflex_coder7b.pt` — adapter weights, state encoder, head, and config dict (backbone_id, hidden, inject_every, adapter_mlp_ratio, max_instr_tokens, chat_template, context_prefix).