Rohan03 commited on
Commit
ce80011
·
verified ·
1 Parent(s): 1640a62

feat: Rewrite README — lead with easy API, 3 usage levels, complete docs

Browse files
Files changed (1) hide show
  1. README.md +125 -197
README.md CHANGED
@@ -11,269 +11,197 @@ tags:
11
  - llm-as-judge
12
  - state-value-evaluation
13
  - memory-augmented
14
- - react
15
- - orchestration
16
- - modular
17
  - slm
18
  - small-language-models
19
- - multi-agent
20
  - human-in-the-loop
21
  - streaming
22
  - tools
23
  - evaluation
24
  - ollama
25
  - local-models
 
 
26
  pipeline_tag: text-generation
27
  ---
28
 
29
- # Purpose Agent v0.2.0
30
-
31
- **The world's first SLM-native self-improving agentic framework.**
32
-
33
- Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.
34
 
35
- ## What Makes This Different
36
-
37
- | Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
38
- |---|:---:|:---:|:---:|:---:|:---:|:---:|
39
- | **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ |
40
- | **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ |
41
- | **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ |
42
- | **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ |
43
- | **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ |
44
- | **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ |
45
- | **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
46
- | **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ |
47
- | **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ |
48
- | **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ |
49
- | **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ |
50
- | **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ |
51
-
52
- ## Architecture
53
 
54
- ```
55
- purpose_agent/
56
- ├── types.py # Core data types
57
- ├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
58
- ├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
59
- ├── actor.py # ReAct agent with 3-tier memory
60
- ├── purpose_function.py # Non-hackable Φ(s) critic
61
- ├── experience_replay.py # Two-phase retrieval (similarity + Q-value)
62
- ├── optimizer.py # Trajectory → heuristic distillation
63
- ├── orchestrator.py # Main loop
64
- ├── streaming.py # 🆕 Async engine + event streaming
65
- ├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG
66
- ├── observability.py # 🆕 Cost tracking, callbacks, metrics
67
- ├── multi_agent.py # 🆕 Agent teams with shared learning
68
- ├── hitl.py # 🆕 Human-in-the-loop + checkpointing
69
- └── evaluation.py # 🆕 Benchmark runner + improvement curves
70
- ```
71
-
72
- ## Quick Start — Local SLM (Zero Cost)
73
-
74
- ```bash
75
- # 1. Install Ollama
76
- curl -fsSL https://ollama.ai/install.sh | sh
77
-
78
- # 2. Pull a small model (1.7B params, runs on any laptop)
79
- ollama pull qwen3:1.7b
80
-
81
- # 3. Run your agent
82
- python my_agent.py
83
- ```
84
 
85
  ```python
86
- from purpose_agent import (
87
- Orchestrator, OllamaBackend, State, Environment, Action,
88
- CalculatorTool, ToolRegistry,
89
- )
90
 
91
- # SLM backend runs locally, zero cost
92
- llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params
93
 
94
- # Or use a cloud LLM
95
- # from purpose_agent import HFInferenceBackend
96
- # llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")
97
 
98
- class MyEnv(Environment):
99
- def execute(self, action, state):
100
- return State(data={"result": "done"})
101
 
102
- orch = Orchestrator(llm=llm, environment=MyEnv())
103
- result = orch.run_task(purpose="Solve the problem", max_steps=10)
104
- print(result.summary())
105
  ```
106
 
107
- ## SLM Model Registry
108
 
109
- Pre-configured models optimized for agent tasks:
110
 
111
- ```python
112
- from purpose_agent import create_slm_backend
113
 
114
- backend = create_slm_backend("phi-4-mini") # 3.8B — best tool-use accuracy
115
- backend = create_slm_backend("qwen3-1.7b") # 1.7B — best balance
116
- backend = create_slm_backend("qwen3-0.6b") # 0.6B — ultra-light
117
- backend = create_slm_backend("llama-3.2-1b") # 1B 128K context
118
- backend = create_slm_backend("smollm2-1.7b") # 1.7B HF native
 
 
 
 
 
 
 
 
 
119
  ```
120
 
121
- ## Multi-Agent with Shared Learning
122
-
123
- Agents learn from each other — when one agent solves a problem, all benefit:
124
 
125
  ```python
126
- from purpose_agent import AgentSpec, AgentTeam, OllamaBackend
127
 
128
- researcher = AgentSpec(
129
- name="researcher", role="Find information",
130
- model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM
131
- expertise_keywords=["search", "find", "research"],
132
- )
133
- coder = AgentSpec(
134
- name="coder", role="Write and debug code",
135
- model=OllamaBackend(model="phi4-mini"), # Better SLM for code
136
- expertise_keywords=["code", "program", "debug"],
137
  )
 
138
 
139
- team = AgentTeam(
140
- agents=[researcher, coder],
141
- default_model=OllamaBackend(model="qwen3:1.7b"),
142
- environment=my_env,
 
143
  )
144
-
145
- # Auto-delegates to the best agent
146
- result = team.run_task(purpose="Search for Python sorting algorithms")
147
- print(team.get_learning_report()) # See shared knowledge
148
  ```
149
 
150
- ## Human-in-the-Loop
151
-
152
- Humans can override Φ scores → the agent permanently learns preferences:
153
 
154
  ```python
155
- from purpose_agent import HITLOrchestrator, CLIInputHandler
156
-
157
- hitl = HITLOrchestrator(
158
- orchestrator=orch,
159
- input_handler=CLIInputHandler(),
160
- approve_actions=True, # Approve each action
161
- review_scores=True, # Override Φ scores
162
- checkpoint_dir="./checkpoints",
163
- )
164
- result = hitl.run_task(purpose="Important task")
165
-
166
- # Inject knowledge directly
167
- hitl.inject_heuristic(
168
- pattern="When facing {problem_type}",
169
- strategy="Always try the simplest approach first",
170
- )
 
 
 
 
 
 
 
 
171
  ```
172
 
173
- ## Streaming
174
 
175
- Real-time event streaming for UIs:
176
 
177
- ```python
178
- import asyncio
179
- from purpose_agent import AsyncOrchestrator
180
-
181
- async def main():
182
- async_orch = AsyncOrchestrator(orch)
183
- async for event in async_orch.run_task_stream(purpose="..."):
184
- if event.event_type == "action":
185
- print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
186
- elif event.event_type == "score":
187
- print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")
188
-
189
- asyncio.run(main())
190
  ```
191
 
192
- ## Tool Framework
193
 
194
- ```python
195
- from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool
196
-
197
- # Create tool from any function
198
- @FunctionTool.from_function
199
- def search(query: str) -> str:
200
- """Search the web for information."""
201
- return requests.get(f"https://api.search.com?q={query}").text
202
-
203
- # Tool RAG for SLMs (only load relevant tools into prompt)
204
- registry = ToolRegistry()
205
- registry.register(CalculatorTool())
206
- registry.register(PythonExecTool())
207
- registry.register(search)
208
-
209
- relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
210
- # → [CalculatorTool, PythonExecTool] (search excluded — saves tokens)
211
- ```
212
 
213
- ## Cost Tracking
214
 
215
- ```python
216
- from purpose_agent import CostTracker
 
 
217
 
218
- tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
219
- tracker.record(prompt_tokens=500, completion_tokens=200)
220
- print(tracker.summary())
221
- # → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
222
  ```
223
 
224
- ## Benchmark & Prove Self-Improvement
 
 
225
 
 
226
  ```python
227
- from purpose_agent import BenchmarkRunner, BenchmarkTask
228
-
229
- runner = BenchmarkRunner(orchestrator=orch)
230
- tasks = [
231
- BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
232
- BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
233
- ]
234
-
235
- result = runner.run(tasks, iterations=10, name="MazeTest")
236
- print(result.summary())
237
- # Iteration Success Rate Avg Φ Avg Steps Avg Reward
238
- # -----------------------------------------------------------------
239
- # 1 40.0% 4.20 12.0 3.20
240
- # 5 70.0% 6.80 8.0 6.50
241
- # 10 90.0% 8.50 6.0 8.90
242
- # Improvement: 40.0% → 90.0% (+50.0%)
243
-
244
- result.save("results/benchmark.json")
245
  ```
246
 
247
- ## Literature Foundation
248
 
249
- | Paper | What it contributes |
250
- |-------|-------------------|
251
- | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) |
252
- | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) |
253
- | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
254
- | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
255
- | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
256
- | [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
257
- | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
258
- | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |
259
 
260
  ## Installation
261
 
262
  ```bash
263
- # Core (no dependencies beyond stdlib)
264
  git clone https://huggingface.co/Rohan03/purpose-agent
265
  cd purpose-agent
266
 
267
- # For local SLMs
268
  pip install ollama
269
 
270
- # For cloud LLMs
271
- pip install huggingface_hub # or: pip install openai
272
-
273
  # Run demo (no API keys needed)
274
  python demo.py
275
  ```
276
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
277
  ## License
278
 
279
  MIT
 
11
  - llm-as-judge
12
  - state-value-evaluation
13
  - memory-augmented
14
+ - multi-agent
 
 
15
  - slm
16
  - small-language-models
 
17
  - human-in-the-loop
18
  - streaming
19
  - tools
20
  - evaluation
21
  - ollama
22
  - local-models
23
+ - no-code
24
+ - easy-to-use
25
  pipeline_tag: text-generation
26
  ---
27
 
28
+ # Purpose Agent
 
 
 
 
29
 
30
+ **Build self-improving AI agent teams with just a purpose.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
+ No PhD required. No infrastructure costs. Runs on your laptop.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ```python
35
+ import purpose_agent as pa
 
 
 
36
 
37
+ # One line. That's all you need.
38
+ team = pa.purpose("Help me research and summarize scientific papers")
39
 
40
+ # Give it tasks. It gets smarter every time.
41
+ result = team.run("Find recent breakthroughs in quantum computing")
42
+ print(result)
43
 
44
+ # Teach it your preferences
45
+ team.teach("Always cite your sources")
46
+ team.teach("Keep summaries under 200 words")
47
 
48
+ # Check what it's learned
49
+ print(team.status())
 
50
  ```
51
 
52
+ ## Three Levels of Usage
53
 
54
+ **Pick your level. You can always go deeper later.**
55
 
56
+ ### Level 1 — Beginner (no technical knowledge needed)
 
57
 
58
+ ```python
59
+ import purpose_agent as pa
60
+
61
+ # Describe what you want. The framework builds the right team.
62
+ team = pa.purpose("Write Python code and test it")
63
+ result = team.run("Create a function that calculates fibonacci numbers")
64
+ print(result)
65
+
66
+ # It auto-detects the best team:
67
+ # "Write code" → architect + coder + tester
68
+ # "Research X" → researcher + analyst
69
+ # "Write blog" → writer + editor
70
+ # "Analyze data" → analyst + reporter
71
+ # "Help me" → general assistant
72
  ```
73
 
74
+ ### Level 2 Intermediate (customize your team)
 
 
75
 
76
  ```python
77
+ import purpose_agent as pa
78
 
79
+ # Build a custom team
80
+ team = pa.Team.build(
81
+ purpose="Customer support assistant",
82
+ agents=["greeter", "resolver", "escalator"],
83
+ model="qwen3:1.7b", # Free local model
 
 
 
 
84
  )
85
+ result = team.run("Customer says: I can't log in to my account")
86
 
87
+ # Add knowledge from your docs
88
+ team = pa.purpose(
89
+ "Answer questions about our product",
90
+ knowledge="./docs/", # Load all files from a folder
91
+ model="qwen3:1.7b",
92
  )
93
+ result = team.ask("What is our refund policy?")
 
 
 
94
  ```
95
 
96
+ ### Level 3 — Advanced (full control)
 
 
97
 
98
  ```python
99
+ import purpose_agent as pa
100
+
101
+ # Graph workflows (like LangGraph)
102
+ graph = pa.Graph()
103
+ graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
104
+ graph.add_node("write", pa.Agent("writer", model="phi4-mini"))
105
+ graph.add_edge(pa.START, "research")
106
+ graph.add_conditional_edge("write", review_fn, {"pass": pa.END, "fail": "research"})
107
+ result = graph.run(initial_state)
108
+
109
+ # Parallel execution (like CrewAI)
110
+ results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
111
+
112
+ # Agent conversations (like AutoGen)
113
+ chat = pa.Conversation([researcher, coder, reviewer])
114
+ result = chat.run("Design a web scraper", rounds=5)
115
+
116
+ # Knowledge-aware agents (like LlamaIndex)
117
+ kb = pa.KnowledgeStore.from_directory("./docs")
118
+ agent = pa.Agent("assistant", tools=[kb.as_tool()])
119
+
120
+ # Human-in-the-loop (like LangGraph)
121
+ hitl = pa.HITLOrchestrator(orch, input_handler=pa.CLIInputHandler(),
122
+ approve_actions=True, review_scores=True)
123
  ```
124
 
125
+ ## What Makes This Different
126
 
127
+ **The only framework where agents actually learn from experience.**
128
 
129
+ Every other framework (LangChain, CrewAI, AutoGen) runs the same way every time. Purpose Agent gets smarter with each task via the **Φ self-improvement loop**:
130
+
131
+ ```
132
+ Task 1: Agent struggles, takes 12 steps → Φ evaluates → learns heuristics
133
+ Task 5: Agent uses learned patterns, takes 8 steps → learns more
134
+ Task 10: Agent is efficient, takes 5 steps → keeps refining
 
 
 
 
 
 
 
135
  ```
136
 
137
+ Plus it absorbs the best of every competing framework:
138
 
139
+ | You want... | Others say use... | Purpose Agent gives you... |
140
+ |---|---|---|
141
+ | **Control** (graphs, conditions, loops) | LangGraph | `pa.Graph()` — same power, with self-improvement |
142
+ | **Speed** (parallel execution) | CrewAI | `pa.parallel()` — real threads, not fake async |
143
+ | **Agents talking** | AutoGen | `pa.Conversation()` — with Φ-scored turns |
144
+ | **Plug-and-play** | OpenAI Agents SDK | `pa.purpose()` even simpler, one function |
145
+ | **Knowledge** (RAG) | LlamaIndex | `pa.KnowledgeStore` — RAG as a tool |
146
+ | **Self-improvement** | Nobody | **Only Purpose Agent** |
 
 
 
 
 
 
 
 
 
 
147
 
148
+ ## Runs on Your Laptop (Free, Private)
149
 
150
+ ```bash
151
+ # Install Ollama (one-time)
152
+ curl -fsSL https://ollama.ai/install.sh | sh
153
+ ollama pull qwen3:1.7b # 1.7B params, runs on CPU
154
 
155
+ # That's it. No API keys. No cloud. No cost.
 
 
 
156
  ```
157
 
158
+ ```python
159
+ team = pa.purpose("Research assistant", model="qwen3:1.7b")
160
+ ```
161
 
162
+ Also works with cloud models:
163
  ```python
164
+ team = pa.purpose("Research assistant", model="gpt-4o") # OpenAI
165
+ team = pa.purpose("Research assistant", model="Qwen/Qwen3-32B") # HuggingFace
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  ```
167
 
168
+ ## Interactive CLI
169
 
170
+ ```bash
171
+ python -m purpose_agent
172
+ ```
173
+
174
+ Walks you through setup step-by-step. No coding required.
 
 
 
 
 
175
 
176
  ## Installation
177
 
178
  ```bash
 
179
  git clone https://huggingface.co/Rohan03/purpose-agent
180
  cd purpose-agent
181
 
182
+ # For local models (recommended)
183
  pip install ollama
184
 
 
 
 
185
  # Run demo (no API keys needed)
186
  python demo.py
187
  ```
188
 
189
+ ## Literature Foundation
190
+
191
+ Built on 8 published papers — every design decision has empirical backing.
192
+ See [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) for the full research trace.
193
+
194
+ | Paper | What it contributes |
195
+ |-------|-------------------|
196
+ | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
197
+ | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
198
+ | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
199
+ | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
200
+ | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
201
+ | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
202
+ | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
203
+ | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
204
+
205
  ## License
206
 
207
  MIT