Exploration
Ideas, technology research, and ad-hoc investigation notes. This is a scratchpad -- content here is not system-of-record.
Diataxis type: Exploration (learning-oriented, not yet distilled)
What Goes Here
- Technology evaluations and comparisons
- Prototype findings
- External API exploration
- Performance investigations
- Ideation and backlog notes
What Does NOT Go Here
- Durable learnings (go to
docs/learnings/) - Design decisions (go to
docs/design-docs/) - Implementation specs (go to
specs/) - Operational how-to guides (go to
docs/guides/)
Exploration Index
| Topic | Type | Date | Summary |
|---|---|---|---|
| grpo-collapse-analysis.md | Investigation | 2026-04 | Post-mortem on Qwen3-1.7B GRPO collapse into degenerate null-argument tool calls |
| grpo-plateau-plan.md | Investigation | 2026-04 | Interventions to push past 30-40% accuracy plateau in GRPO training |
| grpo-training-session-log.md | Investigation | 2026-04 | Running log of SFT warmup + GRPO training sessions on Colab L4 |
| rl-vs-icl-research.md | Comparison | 2026-04 | When GRPO training adds value over pure prompting for small SQL agents |
| train-grpo-walkthrough.md | Prototype | 2026-04 | Step-by-step companion guide for train_grpo.ipynb |
Types
- Tech Eval: Evaluating a library, framework, or service
- Prototype: Findings from exploratory prototyping
- Investigation: Deep dive into a specific problem
- Comparison: Side-by-side analysis of options
Graduating Content
When exploration produces durable insights:
- Extract patterns to
docs/learnings/<category>.md - Create reference files in
docs/references/for agent context - Create how-to guides in
docs/guides/for operational procedures