Rohan03 commited on
Commit
4c785cd
Β·
verified Β·
1 Parent(s): 67678c5

docs: Interview pitch doc for Nugget

Browse files
Files changed (1) hide show
  1. PITCH.md +132 -0
PITCH.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Purpose Agent β€” Interview Pitch Doc
2
+
3
+ > Built by one engineer. 60 files. 13 papers. Tested with real models. Published on PyPI.
4
+
5
+ ---
6
+
7
+ ## The One-Liner
8
+
9
+ I built a framework where AI agents learn from experience without fine-tuning β€” using memory, not gradients.
10
+
11
+ ```bash
12
+ pip install purpose-agent
13
+ ```
14
+
15
+ ---
16
+
17
+ ## The Problem
18
+
19
+ Every agent framework today (LangChain, CrewAI, AutoGen) runs the same way every time. Agent fails at a task? Next time, it fails the exact same way. No learning. No memory. No improvement.
20
+
21
+ Fine-tuning fixes this but costs $10K+ per iteration and requires GPU infrastructure.
22
+
23
+ **My question:** Can we make agents improve without touching the weights?
24
+
25
+ ---
26
+
27
+ ## The Solution
28
+
29
+ **Purpose Learning** β€” a self-improvement loop that works at inference time:
30
+
31
+ ```
32
+ Agent acts β†’ Critic scores every step β†’ Good patterns extracted as heuristics
33
+ β†’ Heuristics immune-scanned for safety β†’ Promoted to memory
34
+ β†’ Next run: heuristics in the prompt β†’ Agent performs better
35
+ ```
36
+
37
+ The key insight: I treat the agent's prompt as a **learnable parameter**. Instead of gradient descent on weights, I do **heuristic accumulation** in context. The agent's knowledge grows; its compute stays flat.
38
+
39
+ ---
40
+
41
+ ## What I Built
42
+
43
+ | Layer | What | Size |
44
+ |-------|------|------|
45
+ | **Core Engine** | Actor β†’ Environment β†’ Purpose Function (Ξ¦) β†’ Optimizer β†’ Experience Replay | 7 modules |
46
+ | **Safety Kernel** | Immune system, memory quarantine, evidence-gated promotion, honest evaluation | 8 modules |
47
+ | **Research** | 5 papers implemented: Meta-Rewarding, Self-Taught Eval, DSPy, LLMCompiler, Retroformer | 5 modules |
48
+ | **Breakthroughs** | Self-improving critic, Mixture-of-Heuristics (MoH), hindsight relabeling, heuristic evolution | 1 module |
49
+ | **User Layer** | `pa.purpose("write code")` β†’ auto-builds team, auto-selects model | 4 modules |
50
+ | **Infra** | 10+ provider support, TOML prompts, universal parser, secure tools | 9 modules |
51
+
52
+ **34 Python modules. Zero core dependencies. Published on PyPI.**
53
+
54
+ ---
55
+
56
+ ## The Three Technical Bets
57
+
58
+ ### Bet 1: Potential-Based Reward Shaping works for LLM agents
59
+
60
+ My Purpose Function Ξ¦(s) is exactly the potential function from Ng et al. (1999). The delta ΔΦ = Ξ¦(s') - Ξ¦(s) provides dense per-step feedback while preserving the optimal policy. I proved this formally with 5 axioms and 3 theorems.
61
+
62
+ **Result:** Ξ¦ scores go from 1.0 β†’ 10.0 across 3 runs on coding tasks, with both Llama-70B and Gemma-26B.
63
+
64
+ ### Bet 2: Memory can replace fine-tuning
65
+
66
+ Instead of gradient updates, I accumulate heuristics in a structured memory store (7 types Γ— 5 statuses). A prompt compiler selects the top-K by relevance Γ— trust Γ— utility under a token budget. This is analogous to Mixture-of-Experts β€” knowledge grows to 100+ heuristics, but only K=5 are activated per step.
67
+
68
+ **Result:** Heuristic library grows 0 β†’ 3 β†’ 9 β†’ 18 across runs. Cross-task transfer works (train on fibonacci, heuristics help with fizzbuzz).
69
+
70
+ ### Bet 3: Self-improvement needs an immune system
71
+
72
+ Unchecked learning is dangerous. A bad trajectory could inject "ignore all previous instructions" into memory. I built a 5-scanner immune system (prompt injection, score manipulation, tool misuse, privacy leaks, scope overreach) with a quarantine pipeline.
73
+
74
+ **Result:** 93% adversarial catch rate, 0% false positives on 30 attack vectors.
75
+
76
+ ---
77
+
78
+ ## Real-World Evidence
79
+
80
+ Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter (not mocks):
81
+
82
+ | Test | Llama-70B | Gemma-26B |
83
+ |------|-----------|-----------|
84
+ | fibonacci (4 unit tests) | βœ“ 100% | βœ“ 100% |
85
+ | fizzbuzz (4 unit tests) | βœ“ 100% | βœ“ 100% |
86
+ | factorial (3 unit tests) | βœ“ 100% | βœ“ 100% |
87
+ | Heuristic growth (3 runs) | 0β†’3β†’9β†’18 | 0β†’3β†’6β†’11 |
88
+ | Adversarial robustness | 93% catch | β€” |
89
+
90
+ **119 automated tests. 0 failures.** Runs in CI without API keys (mock backend).
91
+
92
+ ---
93
+
94
+ ## Technical Depth β€” Three Things I'm Proud Of
95
+
96
+ ### 1. The Universal Parser (`robust_parser.py`)
97
+
98
+ LLMs can't reliably produce JSON. Every model formats differently. Structured output APIs aren't universally supported. I built a 4-strategy parser: TOML β†’ JSON β†’ field extraction β†’ regex. It handles whatever the model gives and never crashes. This single file made the framework work across 10+ providers without provider-specific code.
99
+
100
+ ### 2. Evidence-Gated Memory (`memory.py` + `immune.py` + `memory_ci.py`)
101
+
102
+ V1 claimed "agents get smarter every time." I rewrote it honestly: **agents learn only when evidence says they should.** Every new memory goes through: candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote/reject. Memories are typed (7 kinds), scoped (by agent role, tool, task category), versioned, and reversible. This is the difference between a demo and a production system.
103
+
104
+ ### 3. Formal Convergence Proof (`PURPOSE_LEARNING.md`)
105
+
106
+ I didn't just build it β€” I proved it converges. The Purpose-MDP formalism shows that under 5 bounded axioms, the expected Ξ¦ score is monotonically non-decreasing and converges to a fixed point. The connection to Ng 1999 PBRS is exact: our ΔΦ IS the potential-based shaping reward.
107
+
108
+ ---
109
+
110
+ ## What I'd Build at Nugget
111
+
112
+ This framework proves I can go from **paper to production** in one shot:
113
+ - Read 13 research papers β†’ extract the implementable core β†’ build it
114
+ - Ship with tests, benchmarks, formal proofs, and real-model validation
115
+ - Package for distribution (PyPI, zero dependencies)
116
+ - Document for both technical and non-technical audiences
117
+
118
+ At Nugget, I'd apply this approach to whatever the team's hardest problems are β€” agent reliability, evaluation, cost optimization, or new capabilities.
119
+
120
+ ---
121
+
122
+ ## Links
123
+
124
+ | What | Where |
125
+ |------|-------|
126
+ | Install | `pip install purpose-agent` |
127
+ | PyPI | [pypi.org/project/purpose-agent](https://pypi.org/project/purpose-agent/) |
128
+ | Code | [huggingface.co/Rohan03/purpose-agent](https://huggingface.co/Rohan03/purpose-agent) |
129
+ | Architecture | [ARCHITECTURE.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/ARCHITECTURE.md) |
130
+ | Formal Proofs | [PURPOSE_LEARNING.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/PURPOSE_LEARNING.md) |
131
+ | Research Trace | [COMPILED_RESEARCH.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/COMPILED_RESEARCH.md) |
132
+ | Test Results | [LAUNCH_READINESS.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/LAUNCH_READINESS.md) |