Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
|
| 4 |
+
tags:
|
| 5 |
+
- riscv
|
| 6 |
+
- cross-attention
|
| 7 |
+
- flamingo
|
| 8 |
+
- grounded
|
| 9 |
+
- code-generation
|
| 10 |
+
- rv32i
|
| 11 |
+
library_name: transformers
|
| 12 |
+
pipeline_tag: text-generation
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Reflex-Coder7B-RISCV
|
| 16 |
+
|
| 17 |
+
**A frozen `Qwen2.5-Coder-7B-Instruct` wired to a RISC-V CPU through Flamingo-style cross-attention. Emits one 32-bit RV32I instruction per cycle, conditioned on live machine state. No text tokens generated at inference.**
|
| 18 |
+
|
| 19 |
+
This repo contains the **adapter weights only** (~4.2 GB fp32). The frozen backbone is pulled from `Qwen/Qwen2.5-Coder-7B-Instruct` at runtime. Total inference memory: ~18 GB bf16 backbone + 4.2 GB fp32 adapters + activations.
|
| 20 |
+
|
| 21 |
+
## What it does
|
| 22 |
+
|
| 23 |
+
Given a natural-language prompt (`"say hi"`, `"multiply 7 and 8"`, `"compute 5 factorial"`), Reflex drives a Unicorn-backed RV32I emulator instruction by instruction. Each cycle:
|
| 24 |
+
|
| 25 |
+
1. Read live CPU state (32 registers, PC, memory windows around PC and SP).
|
| 26 |
+
2. Encode as 65 K/V tokens.
|
| 27 |
+
3. Run the frozen backbone forward over the prompt, cross-attn adapters fuse state K/V into hidden states at depths 4, 8, 12, 16, 20, 24.
|
| 28 |
+
4. Last-token pool → MLP → 32 bit sigmoid heads → one 32-bit RV32I instruction word.
|
| 29 |
+
5. Write the word at PC in Unicorn, step one cycle, loop.
|
| 30 |
+
|
| 31 |
+
## Base model
|
| 32 |
+
|
| 33 |
+
[`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) — frozen, bf16, untouched.
|
| 34 |
+
|
| 35 |
+
## Training
|
| 36 |
+
|
| 37 |
+
- **Corpus**: 80,396 (prompt, program) pairs across 56 RV32I program families (arithmetic, loops, comparisons, memory ops, display writes). Every program verified by running it end-to-end through Unicorn before training.
|
| 38 |
+
- **Flattened cycle pool**: ~1.06M `(state, next_instruction)` pairs. Balanced-subsampled to 173k across families per epoch.
|
| 39 |
+
- **Objective**: per-bit binary cross-entropy over the 32 instruction bits, with `rs2` bits (positions 20–24) weighted 5× to overcome the register/immediate polysemy ceiling.
|
| 40 |
+
- **Optimizer**: standard AdamW, cosine LR schedule `1e-4 → 1e-6` over 20k steps, batch 64.
|
| 41 |
+
- **Hardware**: A100 80GB.
|
| 42 |
+
|
| 43 |
+
## Results (18-task eval + 15-task sweep)
|
| 44 |
+
|
| 45 |
+
- **13 / 15 on a mixed zero-shot sweep** (see README), including six tasks the model was never trained on: multiply-by-repeated-add, power, abs, min, popcount, say-arbitrary-3-char-strings.
|
| 46 |
+
- **popcount(255) = 8 in 199 correct consecutive RISC-V instructions** — an emergent algorithm derived at inference from the frozen backbone's prior on what "popcount" means.
|
| 47 |
+
- Full eval script: `uv run eval --checkpoint reflex_coder7b.pt`.
|
| 48 |
+
|
| 49 |
+
## Usage
|
| 50 |
+
|
| 51 |
+
```python
|
| 52 |
+
import torch
|
| 53 |
+
from reflex.demo import load, run_grounded
|
| 54 |
+
|
| 55 |
+
model, tok, cfg = load("reflex_coder7b.pt", device="cuda")
|
| 56 |
+
cpu, emitted, halted, err = run_grounded(
|
| 57 |
+
model, tok, "multiply 7 and 8", device="cuda", max_cycles=200,
|
| 58 |
+
)
|
| 59 |
+
print(f"halted={halted} mem[0x5000]={cpu.mem_word(0x5000)}")
|
| 60 |
+
# halted=True mem[0x5000]=56
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
Or, interactively:
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
uv run demo --checkpoint reflex_coder7b.pt
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Installation
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
git clone https://github.com/ilbertt/reflex
|
| 73 |
+
cd reflex
|
| 74 |
+
uv sync
|
| 75 |
+
# Download this checkpoint into the repo root:
|
| 76 |
+
huggingface-cli download ilbertt/reflex-coder7b-riscv reflex_coder7b.pt --local-dir .
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
The first time you run inference, HuggingFace will automatically fetch the frozen `Qwen2.5-Coder-7B-Instruct` backbone (~15 GB).
|
| 80 |
+
|
| 81 |
+
## Limitations
|
| 82 |
+
|
| 83 |
+
- **rs2 precision ceiling.** Per-cycle rs2 accuracy ~0.99; long loops (>50 ops) can emit a single-bit-wrong instruction that crashes the program before it stores its result.
|
| 84 |
+
- **No domain-knowledge transfer.** Reflex only knows the program-shaped phrasings in its training corpus. Prompts like `"if x5 is fever, display SICK"` fail — the adapters were never taught to route the backbone's semantic knowledge of "fever" through.
|
| 85 |
+
- **Display strings degrade past 3 characters.** `say hi`, `say 42`, `say wow` all land cleanly; `say hello` returns `hell`.
|
| 86 |
+
- **Some common phrasings are unreliable.** `add 100 and 200 and store the result` can return `100` instead of `300`. `subtract 10 from 25` sometimes returns `35` (semantic confusion on the word "from").
|
| 87 |
+
- **RV32I base ISA only** — no M (multiply/divide), no Zbb (count/bitmanip), no F (float). The model synthesizes all "higher" operations from base instructions.
|
| 88 |
+
|
| 89 |
+
## Files
|
| 90 |
+
|
| 91 |
+
- `reflex_coder7b.pt` — adapter weights, state encoder, head, and config dict (backbone_id, hidden, inject_every, adapter_mlp_ratio, max_instr_tokens, chat_template, context_prefix).
|