Upload folder using huggingface_hub
Browse files- .github/copilot-instructions.md +29 -41
- .github/prompts/speckit.analyze.prompt.md +184 -0
- .github/prompts/speckit.checklist.prompt.md +294 -0
- .github/prompts/speckit.clarify.prompt.md +177 -0
- .github/prompts/speckit.constitution.prompt.md +78 -0
- .github/prompts/speckit.implement.prompt.md +134 -0
- .github/prompts/speckit.plan.prompt.md +81 -0
- .github/prompts/speckit.specify.prompt.md +229 -0
- .github/prompts/speckit.tasks.prompt.md +128 -0
- .specify/memory/constitution.md +139 -0
- .specify/scripts/bash/check-prerequisites.sh +166 -0
- .specify/scripts/bash/common.sh +156 -0
- .specify/scripts/bash/create-new-feature.sh +206 -0
- .specify/scripts/bash/setup-plan.sh +61 -0
- .specify/scripts/bash/update-agent-context.sh +772 -0
- .specify/templates/agent-file-template.md +28 -0
- .specify/templates/checklist-template.md +40 -0
- .specify/templates/plan-template.md +104 -0
- .specify/templates/spec-template.md +115 -0
- .specify/templates/tasks-template.md +251 -0
- CONTRIBUTING.md +51 -0
- TESTING.md +37 -0
- docs/CONFLICT_DETECTION.md +307 -0
- docs/SUCCESS_METRICS.md +324 -0
- docs/local-testing/.gitignore +2 -0
- specs/001-personified-ai-agent/checklists/requirements.md +70 -0
- specs/001-personified-ai-agent/gap-analysis.md +623 -0
- specs/001-personified-ai-agent/plan.md +349 -0
- specs/001-personified-ai-agent/spec.md +135 -0
- specs/001-personified-ai-agent/tasks.md +425 -0
- src/agent.py +6 -0
- src/data.py +2 -2
- src/test.py +208 -0
.github/copilot-instructions.md
CHANGED
|
@@ -4,44 +4,32 @@
|
|
| 4 |
|
| 5 |
This is a personal AI agent application that creates an agentic version of real people using RAG (Retrieval Augmented Generation) over markdown documentation. The app is deployed as a Gradio chatbot interface on Hugging Face Spaces.
|
| 6 |
|
| 7 |
-
##
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
- **
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
-
|
| 19 |
-
- **
|
| 20 |
-
- **
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
### Document Loading Convention
|
| 37 |
-
|
| 38 |
-
- Local: `docs/` directory with glob patterns like `["*.md"]` - personal content not yet published or development purposes
|
| 39 |
-
- Remote: GITHUB_REPOS should be set to a comma separated list of public GitHub repos (e.g., `Neosofia/corporate`) - must be public for unauthenticated access
|
| 40 |
-
- Link rewriting: Baseless links get GitHub URL prefix for proper references based on the repo
|
| 41 |
-
|
| 42 |
-
## Common Gotchas/Reminders
|
| 43 |
-
|
| 44 |
-
1. **Model naming**: For some reason OpenAI and Groq can't agree to be concise so the default model name is `"openai/openai/gpt-oss-120b"`
|
| 45 |
-
1. **uv vs pip**: Project uses `uv` for lock files; don't use `pip` directly
|
| 46 |
-
1. **ChromaDB persistence**: EphemeralClient means vectorstore rebuilt on restart
|
| 47 |
-
1. **VPN Warning**: Groq API blocks VPN connections. If tests/app fail with 403 errors, disconnect from VPN
|
|
|
|
| 4 |
|
| 5 |
This is a personal AI agent application that creates an agentic version of real people using RAG (Retrieval Augmented Generation) over markdown documentation. The app is deployed as a Gradio chatbot interface on Hugging Face Spaces.
|
| 6 |
|
| 7 |
+
## Architecture & Development Principles
|
| 8 |
+
|
| 9 |
+
For comprehensive architectural principles, code organization, and development patterns, see the authoritative source: **[`.specify/memory/constitution.md`](../.specify/memory/constitution.md)**
|
| 10 |
+
|
| 11 |
+
The constitution covers:
|
| 12 |
+
- **I. Async-First Architecture** - async operations, MCP servers, session management
|
| 13 |
+
- **II. RAG-First Data Pipeline** - document loading, chunking, vectorstore strategy
|
| 14 |
+
- **III. Type-Safe Configuration** - Pydantic models, immutable config pattern
|
| 15 |
+
- **IV. Session Isolation & MCP Management** - per-session agents, cleanup patterns
|
| 16 |
+
- **V. Test-First Development** - testing requirements, notebook synchronization
|
| 17 |
+
- **VI. Strict Import Organization** - PEP 8, 98-character line limit, formatting
|
| 18 |
+
- **VII. GitHub Tool Restrictions** - rate limits, search filters
|
| 19 |
+
- **VIII. Observability & Logging** - structured JSON logs, optional Loki integration
|
| 20 |
+
- **IX. Persona Consistency** - first-person perspective, employer transparency
|
| 21 |
+
- **X. Unicode Normalization** - output cleanliness
|
| 22 |
+
|
| 23 |
+
## Quick Development Checklist
|
| 24 |
+
|
| 25 |
+
- Run tests after refactoring: `uv run pytest src/test.py -v`
|
| 26 |
+
- Always update notebooks when changing function signatures
|
| 27 |
+
- Use `uv` for all code execution (never `pip` directly)
|
| 28 |
+
- See `TESTING.md` for detailed test setup
|
| 29 |
+
|
| 30 |
+
## Common Gotchas & Reminders
|
| 31 |
+
|
| 32 |
+
1. **Model naming quirk**: OpenAI and Groq use verbose naming, so default is `"openai/openai/gpt-oss-120b"`
|
| 33 |
+
2. **VPN blocks Groq API**: If tests/app fail with 403 errors, disconnect from VPN
|
| 34 |
+
3. **ChromaDB is ephemeral**: Using EphemeralClient means vectorstore rebuilds on each restart (stateless by design)
|
| 35 |
+
4. **Package manager**: Use `uv` exclusively for this project; don't use `pip` directly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/prompts/speckit.analyze.prompt.md
ADDED
|
@@ -0,0 +1,184 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Goal
|
| 14 |
+
|
| 15 |
+
Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`.
|
| 16 |
+
|
| 17 |
+
## Operating Constraints
|
| 18 |
+
|
| 19 |
+
**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually).
|
| 20 |
+
|
| 21 |
+
**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`.
|
| 22 |
+
|
| 23 |
+
## Execution Steps
|
| 24 |
+
|
| 25 |
+
### 1. Initialize Analysis Context
|
| 26 |
+
|
| 27 |
+
Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths:
|
| 28 |
+
|
| 29 |
+
- SPEC = FEATURE_DIR/spec.md
|
| 30 |
+
- PLAN = FEATURE_DIR/plan.md
|
| 31 |
+
- TASKS = FEATURE_DIR/tasks.md
|
| 32 |
+
|
| 33 |
+
Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command).
|
| 34 |
+
For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 35 |
+
|
| 36 |
+
### 2. Load Artifacts (Progressive Disclosure)
|
| 37 |
+
|
| 38 |
+
Load only the minimal necessary context from each artifact:
|
| 39 |
+
|
| 40 |
+
**From spec.md:**
|
| 41 |
+
|
| 42 |
+
- Overview/Context
|
| 43 |
+
- Functional Requirements
|
| 44 |
+
- Non-Functional Requirements
|
| 45 |
+
- User Stories
|
| 46 |
+
- Edge Cases (if present)
|
| 47 |
+
|
| 48 |
+
**From plan.md:**
|
| 49 |
+
|
| 50 |
+
- Architecture/stack choices
|
| 51 |
+
- Data Model references
|
| 52 |
+
- Phases
|
| 53 |
+
- Technical constraints
|
| 54 |
+
|
| 55 |
+
**From tasks.md:**
|
| 56 |
+
|
| 57 |
+
- Task IDs
|
| 58 |
+
- Descriptions
|
| 59 |
+
- Phase grouping
|
| 60 |
+
- Parallel markers [P]
|
| 61 |
+
- Referenced file paths
|
| 62 |
+
|
| 63 |
+
**From constitution:**
|
| 64 |
+
|
| 65 |
+
- Load `.specify/memory/constitution.md` for principle validation
|
| 66 |
+
|
| 67 |
+
### 3. Build Semantic Models
|
| 68 |
+
|
| 69 |
+
Create internal representations (do not include raw artifacts in output):
|
| 70 |
+
|
| 71 |
+
- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" → `user-can-upload-file`)
|
| 72 |
+
- **User story/action inventory**: Discrete user actions with acceptance criteria
|
| 73 |
+
- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases)
|
| 74 |
+
- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements
|
| 75 |
+
|
| 76 |
+
### 4. Detection Passes (Token-Efficient Analysis)
|
| 77 |
+
|
| 78 |
+
Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary.
|
| 79 |
+
|
| 80 |
+
#### A. Duplication Detection
|
| 81 |
+
|
| 82 |
+
- Identify near-duplicate requirements
|
| 83 |
+
- Mark lower-quality phrasing for consolidation
|
| 84 |
+
|
| 85 |
+
#### B. Ambiguity Detection
|
| 86 |
+
|
| 87 |
+
- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria
|
| 88 |
+
- Flag unresolved placeholders (TODO, TKTK, ???, `<placeholder>`, etc.)
|
| 89 |
+
|
| 90 |
+
#### C. Underspecification
|
| 91 |
+
|
| 92 |
+
- Requirements with verbs but missing object or measurable outcome
|
| 93 |
+
- User stories missing acceptance criteria alignment
|
| 94 |
+
- Tasks referencing files or components not defined in spec/plan
|
| 95 |
+
|
| 96 |
+
#### D. Constitution Alignment
|
| 97 |
+
|
| 98 |
+
- Any requirement or plan element conflicting with a MUST principle
|
| 99 |
+
- Missing mandated sections or quality gates from constitution
|
| 100 |
+
|
| 101 |
+
#### E. Coverage Gaps
|
| 102 |
+
|
| 103 |
+
- Requirements with zero associated tasks
|
| 104 |
+
- Tasks with no mapped requirement/story
|
| 105 |
+
- Non-functional requirements not reflected in tasks (e.g., performance, security)
|
| 106 |
+
|
| 107 |
+
#### F. Inconsistency
|
| 108 |
+
|
| 109 |
+
- Terminology drift (same concept named differently across files)
|
| 110 |
+
- Data entities referenced in plan but absent in spec (or vice versa)
|
| 111 |
+
- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note)
|
| 112 |
+
- Conflicting requirements (e.g., one requires Next.js while other specifies Vue)
|
| 113 |
+
|
| 114 |
+
### 5. Severity Assignment
|
| 115 |
+
|
| 116 |
+
Use this heuristic to prioritize findings:
|
| 117 |
+
|
| 118 |
+
- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality
|
| 119 |
+
- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion
|
| 120 |
+
- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case
|
| 121 |
+
- **LOW**: Style/wording improvements, minor redundancy not affecting execution order
|
| 122 |
+
|
| 123 |
+
### 6. Produce Compact Analysis Report
|
| 124 |
+
|
| 125 |
+
Output a Markdown report (no file writes) with the following structure:
|
| 126 |
+
|
| 127 |
+
## Specification Analysis Report
|
| 128 |
+
|
| 129 |
+
| ID | Category | Severity | Location(s) | Summary | Recommendation |
|
| 130 |
+
|----|----------|----------|-------------|---------|----------------|
|
| 131 |
+
| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version |
|
| 132 |
+
|
| 133 |
+
(Add one row per finding; generate stable IDs prefixed by category initial.)
|
| 134 |
+
|
| 135 |
+
**Coverage Summary Table:**
|
| 136 |
+
|
| 137 |
+
| Requirement Key | Has Task? | Task IDs | Notes |
|
| 138 |
+
|-----------------|-----------|----------|-------|
|
| 139 |
+
|
| 140 |
+
**Constitution Alignment Issues:** (if any)
|
| 141 |
+
|
| 142 |
+
**Unmapped Tasks:** (if any)
|
| 143 |
+
|
| 144 |
+
**Metrics:**
|
| 145 |
+
|
| 146 |
+
- Total Requirements
|
| 147 |
+
- Total Tasks
|
| 148 |
+
- Coverage % (requirements with >=1 task)
|
| 149 |
+
- Ambiguity Count
|
| 150 |
+
- Duplication Count
|
| 151 |
+
- Critical Issues Count
|
| 152 |
+
|
| 153 |
+
### 7. Provide Next Actions
|
| 154 |
+
|
| 155 |
+
At end of report, output a concise Next Actions block:
|
| 156 |
+
|
| 157 |
+
- If CRITICAL issues exist: Recommend resolving before `/speckit.implement`
|
| 158 |
+
- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions
|
| 159 |
+
- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'"
|
| 160 |
+
|
| 161 |
+
### 8. Offer Remediation
|
| 162 |
+
|
| 163 |
+
Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.)
|
| 164 |
+
|
| 165 |
+
## Operating Principles
|
| 166 |
+
|
| 167 |
+
### Context Efficiency
|
| 168 |
+
|
| 169 |
+
- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation
|
| 170 |
+
- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis
|
| 171 |
+
- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow
|
| 172 |
+
- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts
|
| 173 |
+
|
| 174 |
+
### Analysis Guidelines
|
| 175 |
+
|
| 176 |
+
- **NEVER modify files** (this is read-only analysis)
|
| 177 |
+
- **NEVER hallucinate missing sections** (if absent, report them accurately)
|
| 178 |
+
- **Prioritize constitution violations** (these are always CRITICAL)
|
| 179 |
+
- **Use examples over exhaustive rules** (cite specific instances, not generic patterns)
|
| 180 |
+
- **Report zero issues gracefully** (emit success report with coverage statistics)
|
| 181 |
+
|
| 182 |
+
## Context
|
| 183 |
+
|
| 184 |
+
$ARGUMENTS
|
.github/prompts/speckit.checklist.prompt.md
ADDED
|
@@ -0,0 +1,294 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Generate a custom checklist for the current feature based on user requirements.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## Checklist Purpose: "Unit Tests for English"
|
| 6 |
+
|
| 7 |
+
**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain.
|
| 8 |
+
|
| 9 |
+
**NOT for verification/testing**:
|
| 10 |
+
|
| 11 |
+
- ❌ NOT "Verify the button clicks correctly"
|
| 12 |
+
- ❌ NOT "Test error handling works"
|
| 13 |
+
- ❌ NOT "Confirm the API returns 200"
|
| 14 |
+
- ❌ NOT checking if code/implementation matches the spec
|
| 15 |
+
|
| 16 |
+
**FOR requirements quality validation**:
|
| 17 |
+
|
| 18 |
+
- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
|
| 19 |
+
- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
|
| 20 |
+
- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
|
| 21 |
+
- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
|
| 22 |
+
- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)
|
| 23 |
+
|
| 24 |
+
**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.
|
| 25 |
+
|
| 26 |
+
## User Input
|
| 27 |
+
|
| 28 |
+
```text
|
| 29 |
+
$ARGUMENTS
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 33 |
+
|
| 34 |
+
## Execution Steps
|
| 35 |
+
|
| 36 |
+
1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list.
|
| 37 |
+
- All file paths must be absolute.
|
| 38 |
+
- For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 39 |
+
|
| 40 |
+
2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST:
|
| 41 |
+
- Be generated from the user's phrasing + extracted signals from spec/plan/tasks
|
| 42 |
+
- Only ask about information that materially changes checklist content
|
| 43 |
+
- Be skipped individually if already unambiguous in `$ARGUMENTS`
|
| 44 |
+
- Prefer precision over breadth
|
| 45 |
+
|
| 46 |
+
Generation algorithm:
|
| 47 |
+
1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
|
| 48 |
+
2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
|
| 49 |
+
3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
|
| 50 |
+
4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria.
|
| 51 |
+
5. Formulate questions chosen from these archetypes:
|
| 52 |
+
- Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
|
| 53 |
+
- Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
|
| 54 |
+
- Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?")
|
| 55 |
+
- Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
|
| 56 |
+
- Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
|
| 57 |
+
- Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
|
| 58 |
+
|
| 59 |
+
Question formatting rules:
|
| 60 |
+
- If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
|
| 61 |
+
- Limit to A–E options maximum; omit table if a free-form answer is clearer
|
| 62 |
+
- Never ask the user to restate what they already said
|
| 63 |
+
- Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope."
|
| 64 |
+
|
| 65 |
+
Defaults when interaction impossible:
|
| 66 |
+
- Depth: Standard
|
| 67 |
+
- Audience: Reviewer (PR) if code-related; Author otherwise
|
| 68 |
+
- Focus: Top 2 relevance clusters
|
| 69 |
+
|
| 70 |
+
Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more.
|
| 71 |
+
|
| 72 |
+
3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers:
|
| 73 |
+
- Derive checklist theme (e.g., security, review, deploy, ux)
|
| 74 |
+
- Consolidate explicit must-have items mentioned by user
|
| 75 |
+
- Map focus selections to category scaffolding
|
| 76 |
+
- Infer any missing context from spec/plan/tasks (do NOT hallucinate)
|
| 77 |
+
|
| 78 |
+
4. **Load feature context**: Read from FEATURE_DIR:
|
| 79 |
+
- spec.md: Feature requirements and scope
|
| 80 |
+
- plan.md (if exists): Technical details, dependencies
|
| 81 |
+
- tasks.md (if exists): Implementation tasks
|
| 82 |
+
|
| 83 |
+
**Context Loading Strategy**:
|
| 84 |
+
- Load only necessary portions relevant to active focus areas (avoid full-file dumping)
|
| 85 |
+
- Prefer summarizing long sections into concise scenario/requirement bullets
|
| 86 |
+
- Use progressive disclosure: add follow-on retrieval only if gaps detected
|
| 87 |
+
- If source docs are large, generate interim summary items instead of embedding raw text
|
| 88 |
+
|
| 89 |
+
5. **Generate checklist** - Create "Unit Tests for Requirements":
|
| 90 |
+
- Create `FEATURE_DIR/checklists/` directory if it doesn't exist
|
| 91 |
+
- Generate unique checklist filename:
|
| 92 |
+
- Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`)
|
| 93 |
+
- Format: `[domain].md`
|
| 94 |
+
- If file exists, append to existing file
|
| 95 |
+
- Number items sequentially starting from CHK001
|
| 96 |
+
- Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists)
|
| 97 |
+
|
| 98 |
+
**CORE PRINCIPLE - Test the Requirements, Not the Implementation**:
|
| 99 |
+
Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for:
|
| 100 |
+
- **Completeness**: Are all necessary requirements present?
|
| 101 |
+
- **Clarity**: Are requirements unambiguous and specific?
|
| 102 |
+
- **Consistency**: Do requirements align with each other?
|
| 103 |
+
- **Measurability**: Can requirements be objectively verified?
|
| 104 |
+
- **Coverage**: Are all scenarios/edge cases addressed?
|
| 105 |
+
|
| 106 |
+
**Category Structure** - Group items by requirement quality dimensions:
|
| 107 |
+
- **Requirement Completeness** (Are all necessary requirements documented?)
|
| 108 |
+
- **Requirement Clarity** (Are requirements specific and unambiguous?)
|
| 109 |
+
- **Requirement Consistency** (Do requirements align without conflicts?)
|
| 110 |
+
- **Acceptance Criteria Quality** (Are success criteria measurable?)
|
| 111 |
+
- **Scenario Coverage** (Are all flows/cases addressed?)
|
| 112 |
+
- **Edge Case Coverage** (Are boundary conditions defined?)
|
| 113 |
+
- **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?)
|
| 114 |
+
- **Dependencies & Assumptions** (Are they documented and validated?)
|
| 115 |
+
- **Ambiguities & Conflicts** (What needs clarification?)
|
| 116 |
+
|
| 117 |
+
**HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**:
|
| 118 |
+
|
| 119 |
+
❌ **WRONG** (Testing implementation):
|
| 120 |
+
- "Verify landing page displays 3 episode cards"
|
| 121 |
+
- "Test hover states work on desktop"
|
| 122 |
+
- "Confirm logo click navigates home"
|
| 123 |
+
|
| 124 |
+
✅ **CORRECT** (Testing requirements quality):
|
| 125 |
+
- "Are the exact number and layout of featured episodes specified?" [Completeness]
|
| 126 |
+
- "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity]
|
| 127 |
+
- "Are hover state requirements consistent across all interactive elements?" [Consistency]
|
| 128 |
+
- "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
|
| 129 |
+
- "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
|
| 130 |
+
- "Are loading states defined for asynchronous episode data?" [Completeness]
|
| 131 |
+
- "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
|
| 132 |
+
|
| 133 |
+
**ITEM STRUCTURE**:
|
| 134 |
+
Each item should follow this pattern:
|
| 135 |
+
- Question format asking about requirement quality
|
| 136 |
+
- Focus on what's WRITTEN (or not written) in the spec/plan
|
| 137 |
+
- Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.]
|
| 138 |
+
- Reference spec section `[Spec §X.Y]` when checking existing requirements
|
| 139 |
+
- Use `[Gap]` marker when checking for missing requirements
|
| 140 |
+
|
| 141 |
+
**EXAMPLES BY QUALITY DIMENSION**:
|
| 142 |
+
|
| 143 |
+
Completeness:
|
| 144 |
+
- "Are error handling requirements defined for all API failure modes? [Gap]"
|
| 145 |
+
- "Are accessibility requirements specified for all interactive elements? [Completeness]"
|
| 146 |
+
- "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
|
| 147 |
+
|
| 148 |
+
Clarity:
|
| 149 |
+
- "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
|
| 150 |
+
- "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
|
| 151 |
+
- "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
|
| 152 |
+
|
| 153 |
+
Consistency:
|
| 154 |
+
- "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
|
| 155 |
+
- "Are card component requirements consistent between landing and detail pages? [Consistency]"
|
| 156 |
+
|
| 157 |
+
Coverage:
|
| 158 |
+
- "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
|
| 159 |
+
- "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
|
| 160 |
+
- "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
|
| 161 |
+
|
| 162 |
+
Measurability:
|
| 163 |
+
- "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
|
| 164 |
+
- "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
|
| 165 |
+
|
| 166 |
+
**Scenario Classification & Coverage** (Requirements Quality Focus):
|
| 167 |
+
- Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
|
| 168 |
+
- For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
|
| 169 |
+
- If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]"
|
| 170 |
+
- Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]"
|
| 171 |
+
|
| 172 |
+
**Traceability Requirements**:
|
| 173 |
+
- MINIMUM: ≥80% of items MUST include at least one traceability reference
|
| 174 |
+
- Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`
|
| 175 |
+
- If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
|
| 176 |
+
|
| 177 |
+
**Surface & Resolve Issues** (Requirements Quality Problems):
|
| 178 |
+
Ask questions about the requirements themselves:
|
| 179 |
+
- Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
|
| 180 |
+
- Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
|
| 181 |
+
- Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
|
| 182 |
+
- Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
|
| 183 |
+
- Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
|
| 184 |
+
|
| 185 |
+
**Content Consolidation**:
|
| 186 |
+
- Soft cap: If raw candidate items > 40, prioritize by risk/impact
|
| 187 |
+
- Merge near-duplicates checking the same requirement aspect
|
| 188 |
+
- If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]"
|
| 189 |
+
|
| 190 |
+
**🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test:
|
| 191 |
+
- ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
|
| 192 |
+
- ❌ References to code execution, user actions, system behavior
|
| 193 |
+
- ❌ "Displays correctly", "works properly", "functions as expected"
|
| 194 |
+
- ❌ "Click", "navigate", "render", "load", "execute"
|
| 195 |
+
- ❌ Test cases, test plans, QA procedures
|
| 196 |
+
- ❌ Implementation details (frameworks, APIs, algorithms)
|
| 197 |
+
|
| 198 |
+
**✅ REQUIRED PATTERNS** - These test requirements quality:
|
| 199 |
+
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
|
| 200 |
+
- ✅ "Is [vague term] quantified/clarified with specific criteria?"
|
| 201 |
+
- ✅ "Are requirements consistent between [section A] and [section B]?"
|
| 202 |
+
- ✅ "Can [requirement] be objectively measured/verified?"
|
| 203 |
+
- ✅ "Are [edge cases/scenarios] addressed in requirements?"
|
| 204 |
+
- ✅ "Does the spec define [missing aspect]?"
|
| 205 |
+
|
| 206 |
+
6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001.
|
| 207 |
+
|
| 208 |
+
7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize:
|
| 209 |
+
- Focus areas selected
|
| 210 |
+
- Depth level
|
| 211 |
+
- Actor/timing
|
| 212 |
+
- Any explicit user-specified must-have items incorporated
|
| 213 |
+
|
| 214 |
+
**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows:
|
| 215 |
+
|
| 216 |
+
- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`)
|
| 217 |
+
- Simple, memorable filenames that indicate checklist purpose
|
| 218 |
+
- Easy identification and navigation in the `checklists/` folder
|
| 219 |
+
|
| 220 |
+
To avoid clutter, use descriptive types and clean up obsolete checklists when done.
|
| 221 |
+
|
| 222 |
+
## Example Checklist Types & Sample Items
|
| 223 |
+
|
| 224 |
+
**UX Requirements Quality:** `ux.md`
|
| 225 |
+
|
| 226 |
+
Sample items (testing the requirements, NOT the implementation):
|
| 227 |
+
|
| 228 |
+
- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]"
|
| 229 |
+
- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]"
|
| 230 |
+
- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]"
|
| 231 |
+
- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]"
|
| 232 |
+
- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]"
|
| 233 |
+
- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]"
|
| 234 |
+
|
| 235 |
+
**API Requirements Quality:** `api.md`
|
| 236 |
+
|
| 237 |
+
Sample items:
|
| 238 |
+
|
| 239 |
+
- "Are error response formats specified for all failure scenarios? [Completeness]"
|
| 240 |
+
- "Are rate limiting requirements quantified with specific thresholds? [Clarity]"
|
| 241 |
+
- "Are authentication requirements consistent across all endpoints? [Consistency]"
|
| 242 |
+
- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
|
| 243 |
+
- "Is versioning strategy documented in requirements? [Gap]"
|
| 244 |
+
|
| 245 |
+
**Performance Requirements Quality:** `performance.md`
|
| 246 |
+
|
| 247 |
+
Sample items:
|
| 248 |
+
|
| 249 |
+
- "Are performance requirements quantified with specific metrics? [Clarity]"
|
| 250 |
+
- "Are performance targets defined for all critical user journeys? [Coverage]"
|
| 251 |
+
- "Are performance requirements under different load conditions specified? [Completeness]"
|
| 252 |
+
- "Can performance requirements be objectively measured? [Measurability]"
|
| 253 |
+
- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]"
|
| 254 |
+
|
| 255 |
+
**Security Requirements Quality:** `security.md`
|
| 256 |
+
|
| 257 |
+
Sample items:
|
| 258 |
+
|
| 259 |
+
- "Are authentication requirements specified for all protected resources? [Coverage]"
|
| 260 |
+
- "Are data protection requirements defined for sensitive information? [Completeness]"
|
| 261 |
+
- "Is the threat model documented and requirements aligned to it? [Traceability]"
|
| 262 |
+
- "Are security requirements consistent with compliance obligations? [Consistency]"
|
| 263 |
+
- "Are security failure/breach response requirements defined? [Gap, Exception Flow]"
|
| 264 |
+
|
| 265 |
+
## Anti-Examples: What NOT To Do
|
| 266 |
+
|
| 267 |
+
**❌ WRONG - These test implementation, not requirements:**
|
| 268 |
+
|
| 269 |
+
```markdown
|
| 270 |
+
- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
|
| 271 |
+
- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
|
| 272 |
+
- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
|
| 273 |
+
- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]
|
| 274 |
+
```
|
| 275 |
+
|
| 276 |
+
**✅ CORRECT - These test requirements quality:**
|
| 277 |
+
|
| 278 |
+
```markdown
|
| 279 |
+
- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
|
| 280 |
+
- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
|
| 281 |
+
- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
|
| 282 |
+
- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
|
| 283 |
+
- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
|
| 284 |
+
- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
|
| 285 |
+
```
|
| 286 |
+
|
| 287 |
+
**Key Differences:**
|
| 288 |
+
|
| 289 |
+
- Wrong: Tests if the system works correctly
|
| 290 |
+
- Correct: Tests if the requirements are written correctly
|
| 291 |
+
- Wrong: Verification of behavior
|
| 292 |
+
- Correct: Validation of requirement quality
|
| 293 |
+
- Wrong: "Does it do X?"
|
| 294 |
+
- Correct: "Is X clearly specified?"
|
.github/prompts/speckit.clarify.prompt.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file.
|
| 16 |
+
|
| 17 |
+
Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases.
|
| 18 |
+
|
| 19 |
+
Execution steps:
|
| 20 |
+
|
| 21 |
+
1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields:
|
| 22 |
+
- `FEATURE_DIR`
|
| 23 |
+
- `FEATURE_SPEC`
|
| 24 |
+
- (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.)
|
| 25 |
+
- If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment.
|
| 26 |
+
- For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 27 |
+
|
| 28 |
+
2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked).
|
| 29 |
+
|
| 30 |
+
Functional Scope & Behavior:
|
| 31 |
+
- Core user goals & success criteria
|
| 32 |
+
- Explicit out-of-scope declarations
|
| 33 |
+
- User roles / personas differentiation
|
| 34 |
+
|
| 35 |
+
Domain & Data Model:
|
| 36 |
+
- Entities, attributes, relationships
|
| 37 |
+
- Identity & uniqueness rules
|
| 38 |
+
- Lifecycle/state transitions
|
| 39 |
+
- Data volume / scale assumptions
|
| 40 |
+
|
| 41 |
+
Interaction & UX Flow:
|
| 42 |
+
- Critical user journeys / sequences
|
| 43 |
+
- Error/empty/loading states
|
| 44 |
+
- Accessibility or localization notes
|
| 45 |
+
|
| 46 |
+
Non-Functional Quality Attributes:
|
| 47 |
+
- Performance (latency, throughput targets)
|
| 48 |
+
- Scalability (horizontal/vertical, limits)
|
| 49 |
+
- Reliability & availability (uptime, recovery expectations)
|
| 50 |
+
- Observability (logging, metrics, tracing signals)
|
| 51 |
+
- Security & privacy (authN/Z, data protection, threat assumptions)
|
| 52 |
+
- Compliance / regulatory constraints (if any)
|
| 53 |
+
|
| 54 |
+
Integration & External Dependencies:
|
| 55 |
+
- External services/APIs and failure modes
|
| 56 |
+
- Data import/export formats
|
| 57 |
+
- Protocol/versioning assumptions
|
| 58 |
+
|
| 59 |
+
Edge Cases & Failure Handling:
|
| 60 |
+
- Negative scenarios
|
| 61 |
+
- Rate limiting / throttling
|
| 62 |
+
- Conflict resolution (e.g., concurrent edits)
|
| 63 |
+
|
| 64 |
+
Constraints & Tradeoffs:
|
| 65 |
+
- Technical constraints (language, storage, hosting)
|
| 66 |
+
- Explicit tradeoffs or rejected alternatives
|
| 67 |
+
|
| 68 |
+
Terminology & Consistency:
|
| 69 |
+
- Canonical glossary terms
|
| 70 |
+
- Avoided synonyms / deprecated terms
|
| 71 |
+
|
| 72 |
+
Completion Signals:
|
| 73 |
+
- Acceptance criteria testability
|
| 74 |
+
- Measurable Definition of Done style indicators
|
| 75 |
+
|
| 76 |
+
Misc / Placeholders:
|
| 77 |
+
- TODO markers / unresolved decisions
|
| 78 |
+
- Ambiguous adjectives ("robust", "intuitive") lacking quantification
|
| 79 |
+
|
| 80 |
+
For each category with Partial or Missing status, add a candidate question opportunity unless:
|
| 81 |
+
- Clarification would not materially change implementation or validation strategy
|
| 82 |
+
- Information is better deferred to planning phase (note internally)
|
| 83 |
+
|
| 84 |
+
3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints:
|
| 85 |
+
- Maximum of 10 total questions across the whole session.
|
| 86 |
+
- Each question must be answerable with EITHER:
|
| 87 |
+
- A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR
|
| 88 |
+
- A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words").
|
| 89 |
+
- Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation.
|
| 90 |
+
- Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved.
|
| 91 |
+
- Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness).
|
| 92 |
+
- Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests.
|
| 93 |
+
- If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic.
|
| 94 |
+
|
| 95 |
+
4. Sequential questioning loop (interactive):
|
| 96 |
+
- Present EXACTLY ONE question at a time.
|
| 97 |
+
- For multiple‑choice questions:
|
| 98 |
+
- **Analyze all options** and determine the **most suitable option** based on:
|
| 99 |
+
- Best practices for the project type
|
| 100 |
+
- Common patterns in similar implementations
|
| 101 |
+
- Risk reduction (security, performance, maintainability)
|
| 102 |
+
- Alignment with any explicit project goals or constraints visible in the spec
|
| 103 |
+
- Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice).
|
| 104 |
+
- Format as: `**Recommended:** Option [X] - <reasoning>`
|
| 105 |
+
- Then render all options as a Markdown table:
|
| 106 |
+
|
| 107 |
+
| Option | Description |
|
| 108 |
+
|--------|-------------|
|
| 109 |
+
| A | <Option A description> |
|
| 110 |
+
| B | <Option B description> |
|
| 111 |
+
| C | <Option C description> (add D/E as needed up to 5) |
|
| 112 |
+
| Short | Provide a different short answer (<=5 words) (Include only if free-form alternative is appropriate) |
|
| 113 |
+
|
| 114 |
+
- After the table, add: `You can reply with the option letter (e.g., "A"), accept the recommendation by saying "yes" or "recommended", or provide your own short answer.`
|
| 115 |
+
- For short‑answer style (no meaningful discrete options):
|
| 116 |
+
- Provide your **suggested answer** based on best practices and context.
|
| 117 |
+
- Format as: `**Suggested:** <your proposed answer> - <brief reasoning>`
|
| 118 |
+
- Then output: `Format: Short answer (<=5 words). You can accept the suggestion by saying "yes" or "suggested", or provide your own answer.`
|
| 119 |
+
- After the user answers:
|
| 120 |
+
- If the user replies with "yes", "recommended", or "suggested", use your previously stated recommendation/suggestion as the answer.
|
| 121 |
+
- Otherwise, validate the answer maps to one option or fits the <=5 word constraint.
|
| 122 |
+
- If ambiguous, ask for a quick disambiguation (count still belongs to same question; do not advance).
|
| 123 |
+
- Once satisfactory, record it in working memory (do not yet write to disk) and move to the next queued question.
|
| 124 |
+
- Stop asking further questions when:
|
| 125 |
+
- All critical ambiguities resolved early (remaining queued items become unnecessary), OR
|
| 126 |
+
- User signals completion ("done", "good", "no more"), OR
|
| 127 |
+
- You reach 5 asked questions.
|
| 128 |
+
- Never reveal future queued questions in advance.
|
| 129 |
+
- If no valid questions exist at start, immediately report no critical ambiguities.
|
| 130 |
+
|
| 131 |
+
5. Integration after EACH accepted answer (incremental update approach):
|
| 132 |
+
- Maintain in-memory representation of the spec (loaded once at start) plus the raw file contents.
|
| 133 |
+
- For the first integrated answer in this session:
|
| 134 |
+
- Ensure a `## Clarifications` section exists (create it just after the highest-level contextual/overview section per the spec template if missing).
|
| 135 |
+
- Under it, create (if not present) a `### Session YYYY-MM-DD` subheading for today.
|
| 136 |
+
- Append a bullet line immediately after acceptance: `- Q: <question> → A: <final answer>`.
|
| 137 |
+
- Then immediately apply the clarification to the most appropriate section(s):
|
| 138 |
+
- Functional ambiguity → Update or add a bullet in Functional Requirements.
|
| 139 |
+
- User interaction / actor distinction → Update User Stories or Actors subsection (if present) with clarified role, constraint, or scenario.
|
| 140 |
+
- Data shape / entities → Update Data Model (add fields, types, relationships) preserving ordering; note added constraints succinctly.
|
| 141 |
+
- Non-functional constraint → Add/modify measurable criteria in Non-Functional / Quality Attributes section (convert vague adjective to metric or explicit target).
|
| 142 |
+
- Edge case / negative flow → Add a new bullet under Edge Cases / Error Handling (or create such subsection if template provides placeholder for it).
|
| 143 |
+
- Terminology conflict → Normalize term across spec; retain original only if necessary by adding `(formerly referred to as "X")` once.
|
| 144 |
+
- If the clarification invalidates an earlier ambiguous statement, replace that statement instead of duplicating; leave no obsolete contradictory text.
|
| 145 |
+
- Save the spec file AFTER each integration to minimize risk of context loss (atomic overwrite).
|
| 146 |
+
- Preserve formatting: do not reorder unrelated sections; keep heading hierarchy intact.
|
| 147 |
+
- Keep each inserted clarification minimal and testable (avoid narrative drift).
|
| 148 |
+
|
| 149 |
+
6. Validation (performed after EACH write plus final pass):
|
| 150 |
+
- Clarifications session contains exactly one bullet per accepted answer (no duplicates).
|
| 151 |
+
- Total asked (accepted) questions ≤ 5.
|
| 152 |
+
- Updated sections contain no lingering vague placeholders the new answer was meant to resolve.
|
| 153 |
+
- No contradictory earlier statement remains (scan for now-invalid alternative choices removed).
|
| 154 |
+
- Markdown structure valid; only allowed new headings: `## Clarifications`, `### Session YYYY-MM-DD`.
|
| 155 |
+
- Terminology consistency: same canonical term used across all updated sections.
|
| 156 |
+
|
| 157 |
+
7. Write the updated spec back to `FEATURE_SPEC`.
|
| 158 |
+
|
| 159 |
+
8. Report completion (after questioning loop ends or early termination):
|
| 160 |
+
- Number of questions asked & answered.
|
| 161 |
+
- Path to updated spec.
|
| 162 |
+
- Sections touched (list names).
|
| 163 |
+
- Coverage summary table listing each taxonomy category with Status: Resolved (was Partial/Missing and addressed), Deferred (exceeds question quota or better suited for planning), Clear (already sufficient), Outstanding (still Partial/Missing but low impact).
|
| 164 |
+
- If any Outstanding or Deferred remain, recommend whether to proceed to `/speckit.plan` or run `/speckit.clarify` again later post-plan.
|
| 165 |
+
- Suggested next command.
|
| 166 |
+
|
| 167 |
+
Behavior rules:
|
| 168 |
+
|
| 169 |
+
- If no meaningful ambiguities found (or all potential questions would be low-impact), respond: "No critical ambiguities detected worth formal clarification." and suggest proceeding.
|
| 170 |
+
- If spec file missing, instruct user to run `/speckit.specify` first (do not create a new spec here).
|
| 171 |
+
- Never exceed 5 total asked questions (clarification retries for a single question do not count as new questions).
|
| 172 |
+
- Avoid speculative tech stack questions unless the absence blocks functional clarity.
|
| 173 |
+
- Respect user early termination signals ("stop", "done", "proceed").
|
| 174 |
+
- If no questions asked due to full coverage, output a compact coverage summary (all categories Clear) then suggest advancing.
|
| 175 |
+
- If quota reached with unresolved high-impact categories remaining, explicitly flag them under Deferred with rationale.
|
| 176 |
+
|
| 177 |
+
Context for prioritization: $ARGUMENTS
|
.github/prompts/speckit.constitution.prompt.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Create or update the project constitution from interactive or provided principle inputs, ensuring all dependent templates stay in sync
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
You are updating the project constitution at `.specify/memory/constitution.md`. This file is a TEMPLATE containing placeholder tokens in square brackets (e.g. `[PROJECT_NAME]`, `[PRINCIPLE_1_NAME]`). Your job is to (a) collect/derive concrete values, (b) fill the template precisely, and (c) propagate any amendments across dependent artifacts.
|
| 16 |
+
|
| 17 |
+
Follow this execution flow:
|
| 18 |
+
|
| 19 |
+
1. Load the existing constitution template at `.specify/memory/constitution.md`.
|
| 20 |
+
- Identify every placeholder token of the form `[ALL_CAPS_IDENTIFIER]`.
|
| 21 |
+
**IMPORTANT**: The user might require less or more principles than the ones used in the template. If a number is specified, respect that - follow the general template. You will update the doc accordingly.
|
| 22 |
+
|
| 23 |
+
2. Collect/derive values for placeholders:
|
| 24 |
+
- If user input (conversation) supplies a value, use it.
|
| 25 |
+
- Otherwise infer from existing repo context (README, docs, prior constitution versions if embedded).
|
| 26 |
+
- For governance dates: `RATIFICATION_DATE` is the original adoption date (if unknown ask or mark TODO), `LAST_AMENDED_DATE` is today if changes are made, otherwise keep previous.
|
| 27 |
+
- `CONSTITUTION_VERSION` must increment according to semantic versioning rules:
|
| 28 |
+
- MAJOR: Backward incompatible governance/principle removals or redefinitions.
|
| 29 |
+
- MINOR: New principle/section added or materially expanded guidance.
|
| 30 |
+
- PATCH: Clarifications, wording, typo fixes, non-semantic refinements.
|
| 31 |
+
- If version bump type ambiguous, propose reasoning before finalizing.
|
| 32 |
+
|
| 33 |
+
3. Draft the updated constitution content:
|
| 34 |
+
- Replace every placeholder with concrete text (no bracketed tokens left except intentionally retained template slots that the project has chosen not to define yet—explicitly justify any left).
|
| 35 |
+
- Preserve heading hierarchy and comments can be removed once replaced unless they still add clarifying guidance.
|
| 36 |
+
- Ensure each Principle section: succinct name line, paragraph (or bullet list) capturing non‑negotiable rules, explicit rationale if not obvious.
|
| 37 |
+
- Ensure Governance section lists amendment procedure, versioning policy, and compliance review expectations.
|
| 38 |
+
|
| 39 |
+
4. Consistency propagation checklist (convert prior checklist into active validations):
|
| 40 |
+
- Read `.specify/templates/plan-template.md` and ensure any "Constitution Check" or rules align with updated principles.
|
| 41 |
+
- Read `.specify/templates/spec-template.md` for scope/requirements alignment—update if constitution adds/removes mandatory sections or constraints.
|
| 42 |
+
- Read `.specify/templates/tasks-template.md` and ensure task categorization reflects new or removed principle-driven task types (e.g., observability, versioning, testing discipline).
|
| 43 |
+
- Read each command file in `.specify/templates/commands/*.md` (including this one) to verify no outdated references (agent-specific names like CLAUDE only) remain when generic guidance is required.
|
| 44 |
+
- Read any runtime guidance docs (e.g., `README.md`, `docs/quickstart.md`, or agent-specific guidance files if present). Update references to principles changed.
|
| 45 |
+
|
| 46 |
+
5. Produce a Sync Impact Report (prepend as an HTML comment at top of the constitution file after update):
|
| 47 |
+
- Version change: old → new
|
| 48 |
+
- List of modified principles (old title → new title if renamed)
|
| 49 |
+
- Added sections
|
| 50 |
+
- Removed sections
|
| 51 |
+
- Templates requiring updates (✅ updated / ⚠ pending) with file paths
|
| 52 |
+
- Follow-up TODOs if any placeholders intentionally deferred.
|
| 53 |
+
|
| 54 |
+
6. Validation before final output:
|
| 55 |
+
- No remaining unexplained bracket tokens.
|
| 56 |
+
- Version line matches report.
|
| 57 |
+
- Dates ISO format YYYY-MM-DD.
|
| 58 |
+
- Principles are declarative, testable, and free of vague language ("should" → replace with MUST/SHOULD rationale where appropriate).
|
| 59 |
+
|
| 60 |
+
7. Write the completed constitution back to `.specify/memory/constitution.md` (overwrite).
|
| 61 |
+
|
| 62 |
+
8. Output a final summary to the user with:
|
| 63 |
+
- New version and bump rationale.
|
| 64 |
+
- Any files flagged for manual follow-up.
|
| 65 |
+
- Suggested commit message (e.g., `docs: amend constitution to vX.Y.Z (principle additions + governance update)`).
|
| 66 |
+
|
| 67 |
+
Formatting & Style Requirements:
|
| 68 |
+
|
| 69 |
+
- Use Markdown headings exactly as in the template (do not demote/promote levels).
|
| 70 |
+
- Wrap long rationale lines to keep readability (<100 chars ideally) but do not hard enforce with awkward breaks.
|
| 71 |
+
- Keep a single blank line between sections.
|
| 72 |
+
- Avoid trailing whitespace.
|
| 73 |
+
|
| 74 |
+
If the user supplies partial updates (e.g., only one principle revision), still perform validation and version decision steps.
|
| 75 |
+
|
| 76 |
+
If critical info missing (e.g., ratification date truly unknown), insert `TODO(<FIELD_NAME>): explanation` and include in the Sync Impact Report under deferred items.
|
| 77 |
+
|
| 78 |
+
Do not create a new template; always operate on the existing `.specify/memory/constitution.md` file.
|
.github/prompts/speckit.implement.prompt.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Execute the implementation plan by processing and executing all tasks defined in tasks.md
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 16 |
+
|
| 17 |
+
2. **Check checklists status** (if FEATURE_DIR/checklists/ exists):
|
| 18 |
+
- Scan all checklist files in the checklists/ directory
|
| 19 |
+
- For each checklist, count:
|
| 20 |
+
- Total items: All lines matching `- [ ]` or `- [X]` or `- [x]`
|
| 21 |
+
- Completed items: Lines matching `- [X]` or `- [x]`
|
| 22 |
+
- Incomplete items: Lines matching `- [ ]`
|
| 23 |
+
- Create a status table:
|
| 24 |
+
|
| 25 |
+
```text
|
| 26 |
+
| Checklist | Total | Completed | Incomplete | Status |
|
| 27 |
+
|-----------|-------|-----------|------------|--------|
|
| 28 |
+
| ux.md | 12 | 12 | 0 | ✓ PASS |
|
| 29 |
+
| test.md | 8 | 5 | 3 | ✗ FAIL |
|
| 30 |
+
| security.md | 6 | 6 | 0 | ✓ PASS |
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
- Calculate overall status:
|
| 34 |
+
- **PASS**: All checklists have 0 incomplete items
|
| 35 |
+
- **FAIL**: One or more checklists have incomplete items
|
| 36 |
+
|
| 37 |
+
- **If any checklist is incomplete**:
|
| 38 |
+
- Display the table with incomplete item counts
|
| 39 |
+
- **STOP** and ask: "Some checklists are incomplete. Do you want to proceed with implementation anyway? (yes/no)"
|
| 40 |
+
- Wait for user response before continuing
|
| 41 |
+
- If user says "no" or "wait" or "stop", halt execution
|
| 42 |
+
- If user says "yes" or "proceed" or "continue", proceed to step 3
|
| 43 |
+
|
| 44 |
+
- **If all checklists are complete**:
|
| 45 |
+
- Display the table showing all checklists passed
|
| 46 |
+
- Automatically proceed to step 3
|
| 47 |
+
|
| 48 |
+
3. Load and analyze the implementation context:
|
| 49 |
+
- **REQUIRED**: Read tasks.md for the complete task list and execution plan
|
| 50 |
+
- **REQUIRED**: Read plan.md for tech stack, architecture, and file structure
|
| 51 |
+
- **IF EXISTS**: Read data-model.md for entities and relationships
|
| 52 |
+
- **IF EXISTS**: Read contracts/ for API specifications and test requirements
|
| 53 |
+
- **IF EXISTS**: Read research.md for technical decisions and constraints
|
| 54 |
+
- **IF EXISTS**: Read quickstart.md for integration scenarios
|
| 55 |
+
|
| 56 |
+
4. **Project Setup Verification**:
|
| 57 |
+
- **REQUIRED**: Create/verify ignore files based on actual project setup:
|
| 58 |
+
|
| 59 |
+
**Detection & Creation Logic**:
|
| 60 |
+
- Check if the following command succeeds to determine if the repository is a git repo (create/verify .gitignore if so):
|
| 61 |
+
|
| 62 |
+
```sh
|
| 63 |
+
git rev-parse --git-dir 2>/dev/null
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
- Check if Dockerfile* exists or Docker in plan.md → create/verify .dockerignore
|
| 67 |
+
- Check if .eslintrc*or eslint.config.* exists → create/verify .eslintignore
|
| 68 |
+
- Check if .prettierrc* exists → create/verify .prettierignore
|
| 69 |
+
- Check if .npmrc or package.json exists → create/verify .npmignore (if publishing)
|
| 70 |
+
- Check if terraform files (*.tf) exist → create/verify .terraformignore
|
| 71 |
+
- Check if .helmignore needed (helm charts present) → create/verify .helmignore
|
| 72 |
+
|
| 73 |
+
**If ignore file already exists**: Verify it contains essential patterns, append missing critical patterns only
|
| 74 |
+
**If ignore file missing**: Create with full pattern set for detected technology
|
| 75 |
+
|
| 76 |
+
**Common Patterns by Technology** (from plan.md tech stack):
|
| 77 |
+
- **Node.js/JavaScript/TypeScript**: `node_modules/`, `dist/`, `build/`, `*.log`, `.env*`
|
| 78 |
+
- **Python**: `__pycache__/`, `*.pyc`, `.venv/`, `venv/`, `dist/`, `*.egg-info/`
|
| 79 |
+
- **Java**: `target/`, `*.class`, `*.jar`, `.gradle/`, `build/`
|
| 80 |
+
- **C#/.NET**: `bin/`, `obj/`, `*.user`, `*.suo`, `packages/`
|
| 81 |
+
- **Go**: `*.exe`, `*.test`, `vendor/`, `*.out`
|
| 82 |
+
- **Ruby**: `.bundle/`, `log/`, `tmp/`, `*.gem`, `vendor/bundle/`
|
| 83 |
+
- **PHP**: `vendor/`, `*.log`, `*.cache`, `*.env`
|
| 84 |
+
- **Rust**: `target/`, `debug/`, `release/`, `*.rs.bk`, `*.rlib`, `*.prof*`, `.idea/`, `*.log`, `.env*`
|
| 85 |
+
- **Kotlin**: `build/`, `out/`, `.gradle/`, `.idea/`, `*.class`, `*.jar`, `*.iml`, `*.log`, `.env*`
|
| 86 |
+
- **C++**: `build/`, `bin/`, `obj/`, `out/`, `*.o`, `*.so`, `*.a`, `*.exe`, `*.dll`, `.idea/`, `*.log`, `.env*`
|
| 87 |
+
- **C**: `build/`, `bin/`, `obj/`, `out/`, `*.o`, `*.a`, `*.so`, `*.exe`, `Makefile`, `config.log`, `.idea/`, `*.log`, `.env*`
|
| 88 |
+
- **Swift**: `.build/`, `DerivedData/`, `*.swiftpm/`, `Packages/`
|
| 89 |
+
- **R**: `.Rproj.user/`, `.Rhistory`, `.RData`, `.Ruserdata`, `*.Rproj`, `packrat/`, `renv/`
|
| 90 |
+
- **Universal**: `.DS_Store`, `Thumbs.db`, `*.tmp`, `*.swp`, `.vscode/`, `.idea/`
|
| 91 |
+
|
| 92 |
+
**Tool-Specific Patterns**:
|
| 93 |
+
- **Docker**: `node_modules/`, `.git/`, `Dockerfile*`, `.dockerignore`, `*.log*`, `.env*`, `coverage/`
|
| 94 |
+
- **ESLint**: `node_modules/`, `dist/`, `build/`, `coverage/`, `*.min.js`
|
| 95 |
+
- **Prettier**: `node_modules/`, `dist/`, `build/`, `coverage/`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`
|
| 96 |
+
- **Terraform**: `.terraform/`, `*.tfstate*`, `*.tfvars`, `.terraform.lock.hcl`
|
| 97 |
+
- **Kubernetes/k8s**: `*.secret.yaml`, `secrets/`, `.kube/`, `kubeconfig*`, `*.key`, `*.crt`
|
| 98 |
+
|
| 99 |
+
5. Parse tasks.md structure and extract:
|
| 100 |
+
- **Task phases**: Setup, Tests, Core, Integration, Polish
|
| 101 |
+
- **Task dependencies**: Sequential vs parallel execution rules
|
| 102 |
+
- **Task details**: ID, description, file paths, parallel markers [P]
|
| 103 |
+
- **Execution flow**: Order and dependency requirements
|
| 104 |
+
|
| 105 |
+
6. Execute implementation following the task plan:
|
| 106 |
+
- **Phase-by-phase execution**: Complete each phase before moving to the next
|
| 107 |
+
- **Respect dependencies**: Run sequential tasks in order, parallel tasks [P] can run together
|
| 108 |
+
- **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks
|
| 109 |
+
- **File-based coordination**: Tasks affecting the same files must run sequentially
|
| 110 |
+
- **Validation checkpoints**: Verify each phase completion before proceeding
|
| 111 |
+
|
| 112 |
+
7. Implementation execution rules:
|
| 113 |
+
- **Setup first**: Initialize project structure, dependencies, configuration
|
| 114 |
+
- **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios
|
| 115 |
+
- **Core development**: Implement models, services, CLI commands, endpoints
|
| 116 |
+
- **Integration work**: Database connections, middleware, logging, external services
|
| 117 |
+
- **Polish and validation**: Unit tests, performance optimization, documentation
|
| 118 |
+
|
| 119 |
+
8. Progress tracking and error handling:
|
| 120 |
+
- Report progress after each completed task
|
| 121 |
+
- Halt execution if any non-parallel task fails
|
| 122 |
+
- For parallel tasks [P], continue with successful tasks, report failed ones
|
| 123 |
+
- Provide clear error messages with context for debugging
|
| 124 |
+
- Suggest next steps if implementation cannot proceed
|
| 125 |
+
- **IMPORTANT** For completed tasks, make sure to mark the task off as [X] in the tasks file.
|
| 126 |
+
|
| 127 |
+
9. Completion validation:
|
| 128 |
+
- Verify all required tasks are completed
|
| 129 |
+
- Check that implemented features match the original specification
|
| 130 |
+
- Validate that tests pass and coverage meets requirements
|
| 131 |
+
- Confirm the implementation follows the technical plan
|
| 132 |
+
- Report final status with summary of completed work
|
| 133 |
+
|
| 134 |
+
Note: This command assumes a complete task breakdown exists in tasks.md. If tasks are incomplete or missing, suggest running `/speckit.tasks` first to regenerate the task list.
|
.github/prompts/speckit.plan.prompt.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Execute the implementation planning workflow using the plan template to generate design artifacts.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
1. **Setup**: Run `.specify/scripts/bash/setup-plan.sh --json` from repo root and parse JSON for FEATURE_SPEC, IMPL_PLAN, SPECS_DIR, BRANCH. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 16 |
+
|
| 17 |
+
2. **Load context**: Read FEATURE_SPEC and `.specify/memory/constitution.md`. Load IMPL_PLAN template (already copied).
|
| 18 |
+
|
| 19 |
+
3. **Execute plan workflow**: Follow the structure in IMPL_PLAN template to:
|
| 20 |
+
- Fill Technical Context (mark unknowns as "NEEDS CLARIFICATION")
|
| 21 |
+
- Fill Constitution Check section from constitution
|
| 22 |
+
- Evaluate gates (ERROR if violations unjustified)
|
| 23 |
+
- Phase 0: Generate research.md (resolve all NEEDS CLARIFICATION)
|
| 24 |
+
- Phase 1: Generate data-model.md, contracts/, quickstart.md
|
| 25 |
+
- Phase 1: Update agent context by running the agent script
|
| 26 |
+
- Re-evaluate Constitution Check post-design
|
| 27 |
+
|
| 28 |
+
4. **Stop and report**: Command ends after Phase 2 planning. Report branch, IMPL_PLAN path, and generated artifacts.
|
| 29 |
+
|
| 30 |
+
## Phases
|
| 31 |
+
|
| 32 |
+
### Phase 0: Outline & Research
|
| 33 |
+
|
| 34 |
+
1. **Extract unknowns from Technical Context** above:
|
| 35 |
+
- For each NEEDS CLARIFICATION → research task
|
| 36 |
+
- For each dependency → best practices task
|
| 37 |
+
- For each integration → patterns task
|
| 38 |
+
|
| 39 |
+
2. **Generate and dispatch research agents**:
|
| 40 |
+
|
| 41 |
+
```text
|
| 42 |
+
For each unknown in Technical Context:
|
| 43 |
+
Task: "Research {unknown} for {feature context}"
|
| 44 |
+
For each technology choice:
|
| 45 |
+
Task: "Find best practices for {tech} in {domain}"
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
3. **Consolidate findings** in `research.md` using format:
|
| 49 |
+
- Decision: [what was chosen]
|
| 50 |
+
- Rationale: [why chosen]
|
| 51 |
+
- Alternatives considered: [what else evaluated]
|
| 52 |
+
|
| 53 |
+
**Output**: research.md with all NEEDS CLARIFICATION resolved
|
| 54 |
+
|
| 55 |
+
### Phase 1: Design & Contracts
|
| 56 |
+
|
| 57 |
+
**Prerequisites:** `research.md` complete
|
| 58 |
+
|
| 59 |
+
1. **Extract entities from feature spec** → `data-model.md`:
|
| 60 |
+
- Entity name, fields, relationships
|
| 61 |
+
- Validation rules from requirements
|
| 62 |
+
- State transitions if applicable
|
| 63 |
+
|
| 64 |
+
2. **Generate API contracts** from functional requirements:
|
| 65 |
+
- For each user action → endpoint
|
| 66 |
+
- Use standard REST/GraphQL patterns
|
| 67 |
+
- Output OpenAPI/GraphQL schema to `/contracts/`
|
| 68 |
+
|
| 69 |
+
3. **Agent context update**:
|
| 70 |
+
- Run `.specify/scripts/bash/update-agent-context.sh copilot`
|
| 71 |
+
- These scripts detect which AI agent is in use
|
| 72 |
+
- Update the appropriate agent-specific context file
|
| 73 |
+
- Add only new technology from current plan
|
| 74 |
+
- Preserve manual additions between markers
|
| 75 |
+
|
| 76 |
+
**Output**: data-model.md, /contracts/*, quickstart.md, agent-specific file
|
| 77 |
+
|
| 78 |
+
## Key rules
|
| 79 |
+
|
| 80 |
+
- Use absolute paths
|
| 81 |
+
- ERROR on gate failures or unresolved clarifications
|
.github/prompts/speckit.specify.prompt.md
ADDED
|
@@ -0,0 +1,229 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Create or update the feature specification from a natural language feature description.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
The text the user typed after `/speckit.specify` in the triggering message **is** the feature description. Assume you always have it available in this conversation even if `$ARGUMENTS` appears literally below. Do not ask the user to repeat it unless they provided an empty command.
|
| 16 |
+
|
| 17 |
+
Given that feature description, do this:
|
| 18 |
+
|
| 19 |
+
1. **Generate a concise short name** (2-4 words) for the branch:
|
| 20 |
+
- Analyze the feature description and extract the most meaningful keywords
|
| 21 |
+
- Create a 2-4 word short name that captures the essence of the feature
|
| 22 |
+
- Use action-noun format when possible (e.g., "add-user-auth", "fix-payment-bug")
|
| 23 |
+
- Preserve technical terms and acronyms (OAuth2, API, JWT, etc.)
|
| 24 |
+
- Keep it concise but descriptive enough to understand the feature at a glance
|
| 25 |
+
- Examples:
|
| 26 |
+
- "I want to add user authentication" → "user-auth"
|
| 27 |
+
- "Implement OAuth2 integration for the API" → "oauth2-api-integration"
|
| 28 |
+
- "Create a dashboard for analytics" → "analytics-dashboard"
|
| 29 |
+
- "Fix payment processing timeout bug" → "fix-payment-timeout"
|
| 30 |
+
|
| 31 |
+
2. Run the script `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS"` from repo root **with the short-name argument** and parse its JSON output for BRANCH_NAME and SPEC_FILE. All file paths must be absolute.
|
| 32 |
+
|
| 33 |
+
**IMPORTANT**:
|
| 34 |
+
|
| 35 |
+
- Append the short-name argument to the `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS"` command with the 2-4 word short name you created in step 1. Keep the feature description as the final argument.
|
| 36 |
+
- Bash example: `--short-name "your-generated-short-name" "Feature description here"`
|
| 37 |
+
- PowerShell example: `-ShortName "your-generated-short-name" "Feature description here"`
|
| 38 |
+
- For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot")
|
| 39 |
+
- You must only ever run this script once
|
| 40 |
+
- The JSON is provided in the terminal as output - always refer to it to get the actual content you're looking for
|
| 41 |
+
|
| 42 |
+
3. Load `.specify/templates/spec-template.md` to understand required sections.
|
| 43 |
+
|
| 44 |
+
4. Follow this execution flow:
|
| 45 |
+
|
| 46 |
+
1. Parse user description from Input
|
| 47 |
+
If empty: ERROR "No feature description provided"
|
| 48 |
+
2. Extract key concepts from description
|
| 49 |
+
Identify: actors, actions, data, constraints
|
| 50 |
+
3. For unclear aspects:
|
| 51 |
+
- Make informed guesses based on context and industry standards
|
| 52 |
+
- Only mark with [NEEDS CLARIFICATION: specific question] if:
|
| 53 |
+
- The choice significantly impacts feature scope or user experience
|
| 54 |
+
- Multiple reasonable interpretations exist with different implications
|
| 55 |
+
- No reasonable default exists
|
| 56 |
+
- **LIMIT: Maximum 3 [NEEDS CLARIFICATION] markers total**
|
| 57 |
+
- Prioritize clarifications by impact: scope > security/privacy > user experience > technical details
|
| 58 |
+
4. Fill User Scenarios & Testing section
|
| 59 |
+
If no clear user flow: ERROR "Cannot determine user scenarios"
|
| 60 |
+
5. Generate Functional Requirements
|
| 61 |
+
Each requirement must be testable
|
| 62 |
+
Use reasonable defaults for unspecified details (document assumptions in Assumptions section)
|
| 63 |
+
6. Define Success Criteria
|
| 64 |
+
Create measurable, technology-agnostic outcomes
|
| 65 |
+
Include both quantitative metrics (time, performance, volume) and qualitative measures (user satisfaction, task completion)
|
| 66 |
+
Each criterion must be verifiable without implementation details
|
| 67 |
+
7. Identify Key Entities (if data involved)
|
| 68 |
+
8. Return: SUCCESS (spec ready for planning)
|
| 69 |
+
|
| 70 |
+
5. Write the specification to SPEC_FILE using the template structure, replacing placeholders with concrete details derived from the feature description (arguments) while preserving section order and headings.
|
| 71 |
+
|
| 72 |
+
6. **Specification Quality Validation**: After writing the initial spec, validate it against quality criteria:
|
| 73 |
+
|
| 74 |
+
a. **Create Spec Quality Checklist**: Generate a checklist file at `FEATURE_DIR/checklists/requirements.md` using the checklist template structure with these validation items:
|
| 75 |
+
|
| 76 |
+
```markdown
|
| 77 |
+
# Specification Quality Checklist: [FEATURE NAME]
|
| 78 |
+
|
| 79 |
+
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
| 80 |
+
**Created**: [DATE]
|
| 81 |
+
**Feature**: [Link to spec.md]
|
| 82 |
+
|
| 83 |
+
## Content Quality
|
| 84 |
+
|
| 85 |
+
- [ ] No implementation details (languages, frameworks, APIs)
|
| 86 |
+
- [ ] Focused on user value and business needs
|
| 87 |
+
- [ ] Written for non-technical stakeholders
|
| 88 |
+
- [ ] All mandatory sections completed
|
| 89 |
+
|
| 90 |
+
## Requirement Completeness
|
| 91 |
+
|
| 92 |
+
- [ ] No [NEEDS CLARIFICATION] markers remain
|
| 93 |
+
- [ ] Requirements are testable and unambiguous
|
| 94 |
+
- [ ] Success criteria are measurable
|
| 95 |
+
- [ ] Success criteria are technology-agnostic (no implementation details)
|
| 96 |
+
- [ ] All acceptance scenarios are defined
|
| 97 |
+
- [ ] Edge cases are identified
|
| 98 |
+
- [ ] Scope is clearly bounded
|
| 99 |
+
- [ ] Dependencies and assumptions identified
|
| 100 |
+
|
| 101 |
+
## Feature Readiness
|
| 102 |
+
|
| 103 |
+
- [ ] All functional requirements have clear acceptance criteria
|
| 104 |
+
- [ ] User scenarios cover primary flows
|
| 105 |
+
- [ ] Feature meets measurable outcomes defined in Success Criteria
|
| 106 |
+
- [ ] No implementation details leak into specification
|
| 107 |
+
|
| 108 |
+
## Notes
|
| 109 |
+
|
| 110 |
+
- Items marked incomplete require spec updates before `/speckit.clarify` or `/speckit.plan`
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
b. **Run Validation Check**: Review the spec against each checklist item:
|
| 114 |
+
- For each item, determine if it passes or fails
|
| 115 |
+
- Document specific issues found (quote relevant spec sections)
|
| 116 |
+
|
| 117 |
+
c. **Handle Validation Results**:
|
| 118 |
+
|
| 119 |
+
- **If all items pass**: Mark checklist complete and proceed to step 6
|
| 120 |
+
|
| 121 |
+
- **If items fail (excluding [NEEDS CLARIFICATION])**:
|
| 122 |
+
1. List the failing items and specific issues
|
| 123 |
+
2. Update the spec to address each issue
|
| 124 |
+
3. Re-run validation until all items pass (max 3 iterations)
|
| 125 |
+
4. If still failing after 3 iterations, document remaining issues in checklist notes and warn user
|
| 126 |
+
|
| 127 |
+
- **If [NEEDS CLARIFICATION] markers remain**:
|
| 128 |
+
1. Extract all [NEEDS CLARIFICATION: ...] markers from the spec
|
| 129 |
+
2. **LIMIT CHECK**: If more than 3 markers exist, keep only the 3 most critical (by scope/security/UX impact) and make informed guesses for the rest
|
| 130 |
+
3. For each clarification needed (max 3), present options to user in this format:
|
| 131 |
+
|
| 132 |
+
```markdown
|
| 133 |
+
## Question [N]: [Topic]
|
| 134 |
+
|
| 135 |
+
**Context**: [Quote relevant spec section]
|
| 136 |
+
|
| 137 |
+
**What we need to know**: [Specific question from NEEDS CLARIFICATION marker]
|
| 138 |
+
|
| 139 |
+
**Suggested Answers**:
|
| 140 |
+
|
| 141 |
+
| Option | Answer | Implications |
|
| 142 |
+
|--------|--------|--------------|
|
| 143 |
+
| A | [First suggested answer] | [What this means for the feature] |
|
| 144 |
+
| B | [Second suggested answer] | [What this means for the feature] |
|
| 145 |
+
| C | [Third suggested answer] | [What this means for the feature] |
|
| 146 |
+
| Custom | Provide your own answer | [Explain how to provide custom input] |
|
| 147 |
+
|
| 148 |
+
**Your choice**: _[Wait for user response]_
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
4. **CRITICAL - Table Formatting**: Ensure markdown tables are properly formatted:
|
| 152 |
+
- Use consistent spacing with pipes aligned
|
| 153 |
+
- Each cell should have spaces around content: `| Content |` not `|Content|`
|
| 154 |
+
- Header separator must have at least 3 dashes: `|--------|`
|
| 155 |
+
- Test that the table renders correctly in markdown preview
|
| 156 |
+
5. Number questions sequentially (Q1, Q2, Q3 - max 3 total)
|
| 157 |
+
6. Present all questions together before waiting for responses
|
| 158 |
+
7. Wait for user to respond with their choices for all questions (e.g., "Q1: A, Q2: Custom - [details], Q3: B")
|
| 159 |
+
8. Update the spec by replacing each [NEEDS CLARIFICATION] marker with the user's selected or provided answer
|
| 160 |
+
9. Re-run validation after all clarifications are resolved
|
| 161 |
+
|
| 162 |
+
d. **Update Checklist**: After each validation iteration, update the checklist file with current pass/fail status
|
| 163 |
+
|
| 164 |
+
7. Report completion with branch name, spec file path, checklist results, and readiness for the next phase (`/speckit.clarify` or `/speckit.plan`).
|
| 165 |
+
|
| 166 |
+
**NOTE:** The script creates and checks out the new branch and initializes the spec file before writing.
|
| 167 |
+
|
| 168 |
+
## General Guidelines
|
| 169 |
+
|
| 170 |
+
## Quick Guidelines
|
| 171 |
+
|
| 172 |
+
- Focus on **WHAT** users need and **WHY**.
|
| 173 |
+
- Avoid HOW to implement (no tech stack, APIs, code structure).
|
| 174 |
+
- Written for business stakeholders, not developers.
|
| 175 |
+
- DO NOT create any checklists that are embedded in the spec. That will be a separate command.
|
| 176 |
+
|
| 177 |
+
### Section Requirements
|
| 178 |
+
|
| 179 |
+
- **Mandatory sections**: Must be completed for every feature
|
| 180 |
+
- **Optional sections**: Include only when relevant to the feature
|
| 181 |
+
- When a section doesn't apply, remove it entirely (don't leave as "N/A")
|
| 182 |
+
|
| 183 |
+
### For AI Generation
|
| 184 |
+
|
| 185 |
+
When creating this spec from a user prompt:
|
| 186 |
+
|
| 187 |
+
1. **Make informed guesses**: Use context, industry standards, and common patterns to fill gaps
|
| 188 |
+
2. **Document assumptions**: Record reasonable defaults in the Assumptions section
|
| 189 |
+
3. **Limit clarifications**: Maximum 3 [NEEDS CLARIFICATION] markers - use only for critical decisions that:
|
| 190 |
+
- Significantly impact feature scope or user experience
|
| 191 |
+
- Have multiple reasonable interpretations with different implications
|
| 192 |
+
- Lack any reasonable default
|
| 193 |
+
4. **Prioritize clarifications**: scope > security/privacy > user experience > technical details
|
| 194 |
+
5. **Think like a tester**: Every vague requirement should fail the "testable and unambiguous" checklist item
|
| 195 |
+
6. **Common areas needing clarification** (only if no reasonable default exists):
|
| 196 |
+
- Feature scope and boundaries (include/exclude specific use cases)
|
| 197 |
+
- User types and permissions (if multiple conflicting interpretations possible)
|
| 198 |
+
- Security/compliance requirements (when legally/financially significant)
|
| 199 |
+
|
| 200 |
+
**Examples of reasonable defaults** (don't ask about these):
|
| 201 |
+
|
| 202 |
+
- Data retention: Industry-standard practices for the domain
|
| 203 |
+
- Performance targets: Standard web/mobile app expectations unless specified
|
| 204 |
+
- Error handling: User-friendly messages with appropriate fallbacks
|
| 205 |
+
- Authentication method: Standard session-based or OAuth2 for web apps
|
| 206 |
+
- Integration patterns: RESTful APIs unless specified otherwise
|
| 207 |
+
|
| 208 |
+
### Success Criteria Guidelines
|
| 209 |
+
|
| 210 |
+
Success criteria must be:
|
| 211 |
+
|
| 212 |
+
1. **Measurable**: Include specific metrics (time, percentage, count, rate)
|
| 213 |
+
2. **Technology-agnostic**: No mention of frameworks, languages, databases, or tools
|
| 214 |
+
3. **User-focused**: Describe outcomes from user/business perspective, not system internals
|
| 215 |
+
4. **Verifiable**: Can be tested/validated without knowing implementation details
|
| 216 |
+
|
| 217 |
+
**Good examples**:
|
| 218 |
+
|
| 219 |
+
- "Users can complete checkout in under 3 minutes"
|
| 220 |
+
- "System supports 10,000 concurrent users"
|
| 221 |
+
- "95% of searches return results in under 1 second"
|
| 222 |
+
- "Task completion rate improves by 40%"
|
| 223 |
+
|
| 224 |
+
**Bad examples** (implementation-focused):
|
| 225 |
+
|
| 226 |
+
- "API response time is under 200ms" (too technical, use "Users see results instantly")
|
| 227 |
+
- "Database can handle 1000 TPS" (implementation detail, use user-facing metric)
|
| 228 |
+
- "React components render efficiently" (framework-specific)
|
| 229 |
+
- "Redis cache hit rate above 80%" (technology-specific)
|
.github/prompts/speckit.tasks.prompt.md
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
description: Generate an actionable, dependency-ordered tasks.md for the feature based on available design artifacts.
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## User Input
|
| 6 |
+
|
| 7 |
+
```text
|
| 8 |
+
$ARGUMENTS
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
You **MUST** consider the user input before proceeding (if not empty).
|
| 12 |
+
|
| 13 |
+
## Outline
|
| 14 |
+
|
| 15 |
+
1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
|
| 16 |
+
|
| 17 |
+
2. **Load design documents**: Read from FEATURE_DIR:
|
| 18 |
+
- **Required**: plan.md (tech stack, libraries, structure), spec.md (user stories with priorities)
|
| 19 |
+
- **Optional**: data-model.md (entities), contracts/ (API endpoints), research.md (decisions), quickstart.md (test scenarios)
|
| 20 |
+
- Note: Not all projects have all documents. Generate tasks based on what's available.
|
| 21 |
+
|
| 22 |
+
3. **Execute task generation workflow**:
|
| 23 |
+
- Load plan.md and extract tech stack, libraries, project structure
|
| 24 |
+
- Load spec.md and extract user stories with their priorities (P1, P2, P3, etc.)
|
| 25 |
+
- If data-model.md exists: Extract entities and map to user stories
|
| 26 |
+
- If contracts/ exists: Map endpoints to user stories
|
| 27 |
+
- If research.md exists: Extract decisions for setup tasks
|
| 28 |
+
- Generate tasks organized by user story (see Task Generation Rules below)
|
| 29 |
+
- Generate dependency graph showing user story completion order
|
| 30 |
+
- Create parallel execution examples per user story
|
| 31 |
+
- Validate task completeness (each user story has all needed tasks, independently testable)
|
| 32 |
+
|
| 33 |
+
4. **Generate tasks.md**: Use `.specify.specify/templates/tasks-template.md` as structure, fill with:
|
| 34 |
+
- Correct feature name from plan.md
|
| 35 |
+
- Phase 1: Setup tasks (project initialization)
|
| 36 |
+
- Phase 2: Foundational tasks (blocking prerequisites for all user stories)
|
| 37 |
+
- Phase 3+: One phase per user story (in priority order from spec.md)
|
| 38 |
+
- Each phase includes: story goal, independent test criteria, tests (if requested), implementation tasks
|
| 39 |
+
- Final Phase: Polish & cross-cutting concerns
|
| 40 |
+
- All tasks must follow the strict checklist format (see Task Generation Rules below)
|
| 41 |
+
- Clear file paths for each task
|
| 42 |
+
- Dependencies section showing story completion order
|
| 43 |
+
- Parallel execution examples per story
|
| 44 |
+
- Implementation strategy section (MVP first, incremental delivery)
|
| 45 |
+
|
| 46 |
+
5. **Report**: Output path to generated tasks.md and summary:
|
| 47 |
+
- Total task count
|
| 48 |
+
- Task count per user story
|
| 49 |
+
- Parallel opportunities identified
|
| 50 |
+
- Independent test criteria for each story
|
| 51 |
+
- Suggested MVP scope (typically just User Story 1)
|
| 52 |
+
- Format validation: Confirm ALL tasks follow the checklist format (checkbox, ID, labels, file paths)
|
| 53 |
+
|
| 54 |
+
Context for task generation: $ARGUMENTS
|
| 55 |
+
|
| 56 |
+
The tasks.md should be immediately executable - each task must be specific enough that an LLM can complete it without additional context.
|
| 57 |
+
|
| 58 |
+
## Task Generation Rules
|
| 59 |
+
|
| 60 |
+
**CRITICAL**: Tasks MUST be organized by user story to enable independent implementation and testing.
|
| 61 |
+
|
| 62 |
+
**Tests are OPTIONAL**: Only generate test tasks if explicitly requested in the feature specification or if user requests TDD approach.
|
| 63 |
+
|
| 64 |
+
### Checklist Format (REQUIRED)
|
| 65 |
+
|
| 66 |
+
Every task MUST strictly follow this format:
|
| 67 |
+
|
| 68 |
+
```text
|
| 69 |
+
- [ ] [TaskID] [P?] [Story?] Description with file path
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
**Format Components**:
|
| 73 |
+
|
| 74 |
+
1. **Checkbox**: ALWAYS start with `- [ ]` (markdown checkbox)
|
| 75 |
+
2. **Task ID**: Sequential number (T001, T002, T003...) in execution order
|
| 76 |
+
3. **[P] marker**: Include ONLY if task is parallelizable (different files, no dependencies on incomplete tasks)
|
| 77 |
+
4. **[Story] label**: REQUIRED for user story phase tasks only
|
| 78 |
+
- Format: [US1], [US2], [US3], etc. (maps to user stories from spec.md)
|
| 79 |
+
- Setup phase: NO story label
|
| 80 |
+
- Foundational phase: NO story label
|
| 81 |
+
- User Story phases: MUST have story label
|
| 82 |
+
- Polish phase: NO story label
|
| 83 |
+
5. **Description**: Clear action with exact file path
|
| 84 |
+
|
| 85 |
+
**Examples**:
|
| 86 |
+
|
| 87 |
+
- ✅ CORRECT: `- [ ] T001 Create project structure per implementation plan`
|
| 88 |
+
- ✅ CORRECT: `- [ ] T005 [P] Implement authentication middleware in src/middleware/auth.py`
|
| 89 |
+
- ✅ CORRECT: `- [ ] T012 [P] [US1] Create User model in src/models/user.py`
|
| 90 |
+
- ✅ CORRECT: `- [ ] T014 [US1] Implement UserService in src/services/user_service.py`
|
| 91 |
+
- ❌ WRONG: `- [ ] Create User model` (missing ID and Story label)
|
| 92 |
+
- ❌ WRONG: `T001 [US1] Create model` (missing checkbox)
|
| 93 |
+
- ❌ WRONG: `- [ ] [US1] Create User model` (missing Task ID)
|
| 94 |
+
- ❌ WRONG: `- [ ] T001 [US1] Create model` (missing file path)
|
| 95 |
+
|
| 96 |
+
### Task Organization
|
| 97 |
+
|
| 98 |
+
1. **From User Stories (spec.md)** - PRIMARY ORGANIZATION:
|
| 99 |
+
- Each user story (P1, P2, P3...) gets its own phase
|
| 100 |
+
- Map all related components to their story:
|
| 101 |
+
- Models needed for that story
|
| 102 |
+
- Services needed for that story
|
| 103 |
+
- Endpoints/UI needed for that story
|
| 104 |
+
- If tests requested: Tests specific to that story
|
| 105 |
+
- Mark story dependencies (most stories should be independent)
|
| 106 |
+
|
| 107 |
+
2. **From Contracts**:
|
| 108 |
+
- Map each contract/endpoint → to the user story it serves
|
| 109 |
+
- If tests requested: Each contract → contract test task [P] before implementation in that story's phase
|
| 110 |
+
|
| 111 |
+
3. **From Data Model**:
|
| 112 |
+
- Map each entity to the user story(ies) that need it
|
| 113 |
+
- If entity serves multiple stories: Put in earliest story or Setup phase
|
| 114 |
+
- Relationships → service layer tasks in appropriate story phase
|
| 115 |
+
|
| 116 |
+
4. **From Setup/Infrastructure**:
|
| 117 |
+
- Shared infrastructure → Setup phase (Phase 1)
|
| 118 |
+
- Foundational/blocking tasks → Foundational phase (Phase 2)
|
| 119 |
+
- Story-specific setup → within that story's phase
|
| 120 |
+
|
| 121 |
+
### Phase Structure
|
| 122 |
+
|
| 123 |
+
- **Phase 1**: Setup (project initialization)
|
| 124 |
+
- **Phase 2**: Foundational (blocking prerequisites - MUST complete before user stories)
|
| 125 |
+
- **Phase 3+**: User Stories in priority order (P1, P2, P3...)
|
| 126 |
+
- Within each story: Tests (if requested) → Models → Services → Endpoints → Integration
|
| 127 |
+
- Each phase should be a complete, independently testable increment
|
| 128 |
+
- **Final Phase**: Polish & Cross-Cutting Concerns
|
.specify/memory/constitution.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AI-Me Constitution
|
| 2 |
+
|
| 3 |
+
A personified AI agent application that creates agentic versions of real people using RAG (Retrieval Augmented Generation) over markdown documentation, deployed as a Gradio chatbot on Hugging Face Spaces.
|
| 4 |
+
|
| 5 |
+
## Core Principles
|
| 6 |
+
|
| 7 |
+
### I. Async-First Architecture
|
| 8 |
+
|
| 9 |
+
All agent operations and external I/O must be async-native. No blocking operations in the hot path. This ensures responsive UI and scalable deployments on cloud platforms.
|
| 10 |
+
- Agent execution flows are asynchronous end-to-end
|
| 11 |
+
- External services (MCP servers, vectorstore, APIs) initialized asynchronously
|
| 12 |
+
- Session-scoped agent instances prevent cross-session state contamination
|
| 13 |
+
|
| 14 |
+
### II. RAG-First Data Pipeline
|
| 15 |
+
|
| 16 |
+
All intelligence derives from document retrieval, not training data or hardcoded knowledge. This ensures accuracy, verifiability, and the ability to update agent knowledge by updating documents.
|
| 17 |
+
- Documents sourced from local and remote sources (GitHub repos)
|
| 18 |
+
- Intelligent chunking preserves document structure for better retrieval
|
| 19 |
+
- Retrieved documents provide context and source attribution for all responses
|
| 20 |
+
|
| 21 |
+
### III. Type-Safe Configuration with Pydantic
|
| 22 |
+
|
| 23 |
+
All configuration validated via Pydantic with strict typing. No string-based config, no runtime surprises, no silent failures.
|
| 24 |
+
- Centralized configuration management with defaults
|
| 25 |
+
- Secrets handled securely with restricted access
|
| 26 |
+
- Immutable config pattern prevents accidental mutations of shared state
|
| 27 |
+
|
| 28 |
+
### IV. Session Isolation & Resource Management
|
| 29 |
+
|
| 30 |
+
Each user session gets its own agent instance with isolated resources. Explicit resource cleanup prevents leaks and shutdown errors.
|
| 31 |
+
- Per-session agent instances keyed by unique session identifier
|
| 32 |
+
- Session-specific resources (memory, temp files) isolated and cleaned up
|
| 33 |
+
- Explicit cleanup lifecycle prevents resource contention
|
| 34 |
+
|
| 35 |
+
### V. Test Driven Development (NON-NEGOTIABLE)
|
| 36 |
+
|
| 37 |
+
All features validated by tests before integration. Code without tests is code without specifications.
|
| 38 |
+
- Tests validate all agent behavior changes and refactorings
|
| 39 |
+
- Test data isolated from production configuration
|
| 40 |
+
- Tests should isolate all external dependencies. NOTE: Inference can not be isolated until we can run larger models like gpt-oss-120b on commodity hardware.
|
| 41 |
+
|
| 42 |
+
### VI. Clear Code Organization
|
| 43 |
+
|
| 44 |
+
Code is organized and readable. Imports follow a consistent structure. Lines are concise without sacrificing clarity.
|
| 45 |
+
- Imports organized top-of-file: standard library → third-party → local
|
| 46 |
+
- Each import group separated by blank line
|
| 47 |
+
- Code formatted for readability and maintainability
|
| 48 |
+
|
| 49 |
+
### VII. Observability First
|
| 50 |
+
|
| 51 |
+
All operations observable through structured logging. Logs provide context for debugging and auditing.
|
| 52 |
+
- Operations logged with session context and structured data
|
| 53 |
+
- Retrieval and tool execution visible in logs for debugging
|
| 54 |
+
- Optional integration with centralized logging for production insights
|
| 55 |
+
|
| 56 |
+
### VIII. Persona Consistency
|
| 57 |
+
|
| 58 |
+
The agent represents a real person with clear identity. All responses maintain first-person perspective and relationship transparency.
|
| 59 |
+
- Agent refers to self by name and maintains consistent identity
|
| 60 |
+
- Professional relationships clearly indicated
|
| 61 |
+
- Tone is personable, friendly, and authentic
|
| 62 |
+
|
| 63 |
+
### IX. Unicode Normalization & Output Cleanliness
|
| 64 |
+
|
| 65 |
+
All agent responses normalized for clean, consistent output across platforms.
|
| 66 |
+
- Special characters normalized to ASCII equivalents
|
| 67 |
+
- Output cleaned before returning to user
|
| 68 |
+
- Output links should work
|
| 69 |
+
|
| 70 |
+
## Technology Stack Constraints
|
| 71 |
+
|
| 72 |
+
- **Python**: 3.12+ only (via `requires-python = "~=3.12.0"`)
|
| 73 |
+
- **Package Manager**: `uv` exclusively (not pip)
|
| 74 |
+
- **LLM Provider**: Groq `openai/openai/gpt-oss-120b` (primary), OpenAI API (tracing only)
|
| 75 |
+
- **VectorDB**: ChromaDB ephemeral (in-memory, no persistence)
|
| 76 |
+
- **Embeddings**: HuggingFace sentence-transformers
|
| 77 |
+
- **Framework**: OpenAI Agents SDK with async support
|
| 78 |
+
- **UI**: Gradio with Hugging Face Spaces deployment
|
| 79 |
+
- **MCP Servers**: GitHub, Time, Memory (optional per session)
|
| 80 |
+
|
| 81 |
+
## Development Workflow
|
| 82 |
+
|
| 83 |
+
1. **Environment Setup**:
|
| 84 |
+
- Create `.env` with required keys: `OPENAI_API_KEY`, `GROQ_API_KEY`, `GITHUB_PERSONAL_ACCESS_TOKEN`, `BOT_FULL_NAME`, `APP_NAME`, `GITHUB_REPOS`
|
| 85 |
+
- Run `uv sync` to install dependencies
|
| 86 |
+
- Setup pre-commit hook to auto-clear notebook outputs
|
| 87 |
+
|
| 88 |
+
2. **Local Development**:
|
| 89 |
+
- Use `docs/` directory for markdown (won't deploy unless pushed to GitHub repo)
|
| 90 |
+
- Test locally: `uv run src/app.py` (Gradio on port 7860)
|
| 91 |
+
- Run tests: `uv run pytest src/test.py -v`
|
| 92 |
+
- Edit notebooks then validate changes don't break tests
|
| 93 |
+
|
| 94 |
+
3. **Docker/Notebook Development**:
|
| 95 |
+
- Build: `docker compose build notebooks`
|
| 96 |
+
- Run: `docker compose up notebooks`
|
| 97 |
+
- Attach via Dev Containers extension for IDE integration
|
| 98 |
+
|
| 99 |
+
4. **Deployment**:
|
| 100 |
+
- Push to `main` triggers GitHub Actions CI/CD
|
| 101 |
+
- CI runs tests with `GROQ_API_KEY` and `OPENAI_API_KEY`
|
| 102 |
+
- CD deploys to Hugging Face Spaces via Gradio CLI with all required env vars
|
| 103 |
+
|
| 104 |
+
## Code Organization
|
| 105 |
+
|
| 106 |
+
- `src/config.py` - Pydantic BaseSettings, all configuration
|
| 107 |
+
- `src/data.py` - DataManager class, complete document pipeline
|
| 108 |
+
- `src/agent.py` - AIMeAgent class, MCP setup, agent creation
|
| 109 |
+
- `src/app.py` - Gradio interface, session management
|
| 110 |
+
- `src/test.py` - Integration tests with pytest-asyncio
|
| 111 |
+
- `src/notebooks/experiments.ipynb` - Development sandbox (test all APIs here first)
|
| 112 |
+
- `docs/` - Local markdown for RAG development
|
| 113 |
+
- `test_data/` - Test fixtures and sample data
|
| 114 |
+
- `.github/copilot-instructions.md` - Detailed AI assistant guidance
|
| 115 |
+
- `.specify/` - Spec-Driven Development templates and memory
|
| 116 |
+
|
| 117 |
+
## Non-Negotiables
|
| 118 |
+
|
| 119 |
+
1. **No hardcoded knowledge** - Everything comes from RAG
|
| 120 |
+
2. **No shared mutable state** - Session-scoped instances only
|
| 121 |
+
3. **No blocking operations** - Async throughout
|
| 122 |
+
4. **No untested refactorings** - Run tests first
|
| 123 |
+
5. **No outdated notebooks** - Sync with code changes
|
| 124 |
+
6. **No unstructured logs** - JSON for machines, readable for humans
|
| 125 |
+
7. **No credential leaks** - .gitignore and .dockerignore files to help prevent secret slips. Never build secrets into a dockerfile!
|
| 126 |
+
8. **No notebook outputs in GIT** - you must clean up the code
|
| 127 |
+
|
| 128 |
+
## Governance
|
| 129 |
+
|
| 130 |
+
This constitution supersedes all other practices and is the single source of truth for architectural decisions. All PRs and feature requests must verify compliance with these principles. Code review must check:
|
| 131 |
+
- Async-first patterns are used
|
| 132 |
+
- Type safety via Pydantic validation
|
| 133 |
+
- Session isolation maintained
|
| 134 |
+
- Tests pass and notebooks updated
|
| 135 |
+
- Imports organized per PEP 8
|
| 136 |
+
- Observability (logging) present
|
| 137 |
+
- Output cleanliness (Unicode normalization)
|
| 138 |
+
|
| 139 |
+
**Version**: 1.0.0 | **Ratified**: 2025-10-23 | **Last Amended**: 2025-10-23
|
.specify/scripts/bash/check-prerequisites.sh
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
|
| 3 |
+
# Consolidated prerequisite checking script
|
| 4 |
+
#
|
| 5 |
+
# This script provides unified prerequisite checking for Spec-Driven Development workflow.
|
| 6 |
+
# It replaces the functionality previously spread across multiple scripts.
|
| 7 |
+
#
|
| 8 |
+
# Usage: ./check-prerequisites.sh [OPTIONS]
|
| 9 |
+
#
|
| 10 |
+
# OPTIONS:
|
| 11 |
+
# --json Output in JSON format
|
| 12 |
+
# --require-tasks Require tasks.md to exist (for implementation phase)
|
| 13 |
+
# --include-tasks Include tasks.md in AVAILABLE_DOCS list
|
| 14 |
+
# --paths-only Only output path variables (no validation)
|
| 15 |
+
# --help, -h Show help message
|
| 16 |
+
#
|
| 17 |
+
# OUTPUTS:
|
| 18 |
+
# JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
|
| 19 |
+
# Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
|
| 20 |
+
# Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
|
| 21 |
+
|
| 22 |
+
set -e
|
| 23 |
+
|
| 24 |
+
# Parse command line arguments
|
| 25 |
+
JSON_MODE=false
|
| 26 |
+
REQUIRE_TASKS=false
|
| 27 |
+
INCLUDE_TASKS=false
|
| 28 |
+
PATHS_ONLY=false
|
| 29 |
+
|
| 30 |
+
for arg in "$@"; do
|
| 31 |
+
case "$arg" in
|
| 32 |
+
--json)
|
| 33 |
+
JSON_MODE=true
|
| 34 |
+
;;
|
| 35 |
+
--require-tasks)
|
| 36 |
+
REQUIRE_TASKS=true
|
| 37 |
+
;;
|
| 38 |
+
--include-tasks)
|
| 39 |
+
INCLUDE_TASKS=true
|
| 40 |
+
;;
|
| 41 |
+
--paths-only)
|
| 42 |
+
PATHS_ONLY=true
|
| 43 |
+
;;
|
| 44 |
+
--help|-h)
|
| 45 |
+
cat << 'EOF'
|
| 46 |
+
Usage: check-prerequisites.sh [OPTIONS]
|
| 47 |
+
|
| 48 |
+
Consolidated prerequisite checking for Spec-Driven Development workflow.
|
| 49 |
+
|
| 50 |
+
OPTIONS:
|
| 51 |
+
--json Output in JSON format
|
| 52 |
+
--require-tasks Require tasks.md to exist (for implementation phase)
|
| 53 |
+
--include-tasks Include tasks.md in AVAILABLE_DOCS list
|
| 54 |
+
--paths-only Only output path variables (no prerequisite validation)
|
| 55 |
+
--help, -h Show this help message
|
| 56 |
+
|
| 57 |
+
EXAMPLES:
|
| 58 |
+
# Check task prerequisites (plan.md required)
|
| 59 |
+
./check-prerequisites.sh --json
|
| 60 |
+
|
| 61 |
+
# Check implementation prerequisites (plan.md + tasks.md required)
|
| 62 |
+
./check-prerequisites.sh --json --require-tasks --include-tasks
|
| 63 |
+
|
| 64 |
+
# Get feature paths only (no validation)
|
| 65 |
+
./check-prerequisites.sh --paths-only
|
| 66 |
+
|
| 67 |
+
EOF
|
| 68 |
+
exit 0
|
| 69 |
+
;;
|
| 70 |
+
*)
|
| 71 |
+
echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
|
| 72 |
+
exit 1
|
| 73 |
+
;;
|
| 74 |
+
esac
|
| 75 |
+
done
|
| 76 |
+
|
| 77 |
+
# Source common functions
|
| 78 |
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 79 |
+
source "$SCRIPT_DIR/common.sh"
|
| 80 |
+
|
| 81 |
+
# Get feature paths and validate branch
|
| 82 |
+
eval $(get_feature_paths)
|
| 83 |
+
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
|
| 84 |
+
|
| 85 |
+
# If paths-only mode, output paths and exit (support JSON + paths-only combined)
|
| 86 |
+
if $PATHS_ONLY; then
|
| 87 |
+
if $JSON_MODE; then
|
| 88 |
+
# Minimal JSON paths payload (no validation performed)
|
| 89 |
+
printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
|
| 90 |
+
"$REPO_ROOT" "$CURRENT_BRANCH" "$FEATURE_DIR" "$FEATURE_SPEC" "$IMPL_PLAN" "$TASKS"
|
| 91 |
+
else
|
| 92 |
+
echo "REPO_ROOT: $REPO_ROOT"
|
| 93 |
+
echo "BRANCH: $CURRENT_BRANCH"
|
| 94 |
+
echo "FEATURE_DIR: $FEATURE_DIR"
|
| 95 |
+
echo "FEATURE_SPEC: $FEATURE_SPEC"
|
| 96 |
+
echo "IMPL_PLAN: $IMPL_PLAN"
|
| 97 |
+
echo "TASKS: $TASKS"
|
| 98 |
+
fi
|
| 99 |
+
exit 0
|
| 100 |
+
fi
|
| 101 |
+
|
| 102 |
+
# Validate required directories and files
|
| 103 |
+
if [[ ! -d "$FEATURE_DIR" ]]; then
|
| 104 |
+
echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
|
| 105 |
+
echo "Run /speckit.specify first to create the feature structure." >&2
|
| 106 |
+
exit 1
|
| 107 |
+
fi
|
| 108 |
+
|
| 109 |
+
if [[ ! -f "$IMPL_PLAN" ]]; then
|
| 110 |
+
echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
|
| 111 |
+
echo "Run /speckit.plan first to create the implementation plan." >&2
|
| 112 |
+
exit 1
|
| 113 |
+
fi
|
| 114 |
+
|
| 115 |
+
# Check for tasks.md if required
|
| 116 |
+
if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
|
| 117 |
+
echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
|
| 118 |
+
echo "Run /speckit.tasks first to create the task list." >&2
|
| 119 |
+
exit 1
|
| 120 |
+
fi
|
| 121 |
+
|
| 122 |
+
# Build list of available documents
|
| 123 |
+
docs=()
|
| 124 |
+
|
| 125 |
+
# Always check these optional docs
|
| 126 |
+
[[ -f "$RESEARCH" ]] && docs+=("research.md")
|
| 127 |
+
[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
|
| 128 |
+
|
| 129 |
+
# Check contracts directory (only if it exists and has files)
|
| 130 |
+
if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
|
| 131 |
+
docs+=("contracts/")
|
| 132 |
+
fi
|
| 133 |
+
|
| 134 |
+
[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
|
| 135 |
+
|
| 136 |
+
# Include tasks.md if requested and it exists
|
| 137 |
+
if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
|
| 138 |
+
docs+=("tasks.md")
|
| 139 |
+
fi
|
| 140 |
+
|
| 141 |
+
# Output results
|
| 142 |
+
if $JSON_MODE; then
|
| 143 |
+
# Build JSON array of documents
|
| 144 |
+
if [[ ${#docs[@]} -eq 0 ]]; then
|
| 145 |
+
json_docs="[]"
|
| 146 |
+
else
|
| 147 |
+
json_docs=$(printf '"%s",' "${docs[@]}")
|
| 148 |
+
json_docs="[${json_docs%,}]"
|
| 149 |
+
fi
|
| 150 |
+
|
| 151 |
+
printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$FEATURE_DIR" "$json_docs"
|
| 152 |
+
else
|
| 153 |
+
# Text output
|
| 154 |
+
echo "FEATURE_DIR:$FEATURE_DIR"
|
| 155 |
+
echo "AVAILABLE_DOCS:"
|
| 156 |
+
|
| 157 |
+
# Show status of each potential document
|
| 158 |
+
check_file "$RESEARCH" "research.md"
|
| 159 |
+
check_file "$DATA_MODEL" "data-model.md"
|
| 160 |
+
check_dir "$CONTRACTS_DIR" "contracts/"
|
| 161 |
+
check_file "$QUICKSTART" "quickstart.md"
|
| 162 |
+
|
| 163 |
+
if $INCLUDE_TASKS; then
|
| 164 |
+
check_file "$TASKS" "tasks.md"
|
| 165 |
+
fi
|
| 166 |
+
fi
|
.specify/scripts/bash/common.sh
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Common functions and variables for all scripts
|
| 3 |
+
|
| 4 |
+
# Get repository root, with fallback for non-git repositories
|
| 5 |
+
get_repo_root() {
|
| 6 |
+
if git rev-parse --show-toplevel >/dev/null 2>&1; then
|
| 7 |
+
git rev-parse --show-toplevel
|
| 8 |
+
else
|
| 9 |
+
# Fall back to script location for non-git repos
|
| 10 |
+
local script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 11 |
+
(cd "$script_dir/../../.." && pwd)
|
| 12 |
+
fi
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
# Get current branch, with fallback for non-git repositories
|
| 16 |
+
get_current_branch() {
|
| 17 |
+
# First check if SPECIFY_FEATURE environment variable is set
|
| 18 |
+
if [[ -n "${SPECIFY_FEATURE:-}" ]]; then
|
| 19 |
+
echo "$SPECIFY_FEATURE"
|
| 20 |
+
return
|
| 21 |
+
fi
|
| 22 |
+
|
| 23 |
+
# Then check git if available
|
| 24 |
+
if git rev-parse --abbrev-ref HEAD >/dev/null 2>&1; then
|
| 25 |
+
git rev-parse --abbrev-ref HEAD
|
| 26 |
+
return
|
| 27 |
+
fi
|
| 28 |
+
|
| 29 |
+
# For non-git repos, try to find the latest feature directory
|
| 30 |
+
local repo_root=$(get_repo_root)
|
| 31 |
+
local specs_dir="$repo_root/specs"
|
| 32 |
+
|
| 33 |
+
if [[ -d "$specs_dir" ]]; then
|
| 34 |
+
local latest_feature=""
|
| 35 |
+
local highest=0
|
| 36 |
+
|
| 37 |
+
for dir in "$specs_dir"/*; do
|
| 38 |
+
if [[ -d "$dir" ]]; then
|
| 39 |
+
local dirname=$(basename "$dir")
|
| 40 |
+
if [[ "$dirname" =~ ^([0-9]{3})- ]]; then
|
| 41 |
+
local number=${BASH_REMATCH[1]}
|
| 42 |
+
number=$((10#$number))
|
| 43 |
+
if [[ "$number" -gt "$highest" ]]; then
|
| 44 |
+
highest=$number
|
| 45 |
+
latest_feature=$dirname
|
| 46 |
+
fi
|
| 47 |
+
fi
|
| 48 |
+
fi
|
| 49 |
+
done
|
| 50 |
+
|
| 51 |
+
if [[ -n "$latest_feature" ]]; then
|
| 52 |
+
echo "$latest_feature"
|
| 53 |
+
return
|
| 54 |
+
fi
|
| 55 |
+
fi
|
| 56 |
+
|
| 57 |
+
echo "main" # Final fallback
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
# Check if we have git available
|
| 61 |
+
has_git() {
|
| 62 |
+
git rev-parse --show-toplevel >/dev/null 2>&1
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
check_feature_branch() {
|
| 66 |
+
local branch="$1"
|
| 67 |
+
local has_git_repo="$2"
|
| 68 |
+
|
| 69 |
+
# For non-git repos, we can't enforce branch naming but still provide output
|
| 70 |
+
if [[ "$has_git_repo" != "true" ]]; then
|
| 71 |
+
echo "[specify] Warning: Git repository not detected; skipped branch validation" >&2
|
| 72 |
+
return 0
|
| 73 |
+
fi
|
| 74 |
+
|
| 75 |
+
if [[ ! "$branch" =~ ^[0-9]{3}- ]]; then
|
| 76 |
+
echo "ERROR: Not on a feature branch. Current branch: $branch" >&2
|
| 77 |
+
echo "Feature branches should be named like: 001-feature-name" >&2
|
| 78 |
+
return 1
|
| 79 |
+
fi
|
| 80 |
+
|
| 81 |
+
return 0
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
get_feature_dir() { echo "$1/specs/$2"; }
|
| 85 |
+
|
| 86 |
+
# Find feature directory by numeric prefix instead of exact branch match
|
| 87 |
+
# This allows multiple branches to work on the same spec (e.g., 004-fix-bug, 004-add-feature)
|
| 88 |
+
find_feature_dir_by_prefix() {
|
| 89 |
+
local repo_root="$1"
|
| 90 |
+
local branch_name="$2"
|
| 91 |
+
local specs_dir="$repo_root/specs"
|
| 92 |
+
|
| 93 |
+
# Extract numeric prefix from branch (e.g., "004" from "004-whatever")
|
| 94 |
+
if [[ ! "$branch_name" =~ ^([0-9]{3})- ]]; then
|
| 95 |
+
# If branch doesn't have numeric prefix, fall back to exact match
|
| 96 |
+
echo "$specs_dir/$branch_name"
|
| 97 |
+
return
|
| 98 |
+
fi
|
| 99 |
+
|
| 100 |
+
local prefix="${BASH_REMATCH[1]}"
|
| 101 |
+
|
| 102 |
+
# Search for directories in specs/ that start with this prefix
|
| 103 |
+
local matches=()
|
| 104 |
+
if [[ -d "$specs_dir" ]]; then
|
| 105 |
+
for dir in "$specs_dir"/"$prefix"-*; do
|
| 106 |
+
if [[ -d "$dir" ]]; then
|
| 107 |
+
matches+=("$(basename "$dir")")
|
| 108 |
+
fi
|
| 109 |
+
done
|
| 110 |
+
fi
|
| 111 |
+
|
| 112 |
+
# Handle results
|
| 113 |
+
if [[ ${#matches[@]} -eq 0 ]]; then
|
| 114 |
+
# No match found - return the branch name path (will fail later with clear error)
|
| 115 |
+
echo "$specs_dir/$branch_name"
|
| 116 |
+
elif [[ ${#matches[@]} -eq 1 ]]; then
|
| 117 |
+
# Exactly one match - perfect!
|
| 118 |
+
echo "$specs_dir/${matches[0]}"
|
| 119 |
+
else
|
| 120 |
+
# Multiple matches - this shouldn't happen with proper naming convention
|
| 121 |
+
echo "ERROR: Multiple spec directories found with prefix '$prefix': ${matches[*]}" >&2
|
| 122 |
+
echo "Please ensure only one spec directory exists per numeric prefix." >&2
|
| 123 |
+
echo "$specs_dir/$branch_name" # Return something to avoid breaking the script
|
| 124 |
+
fi
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
get_feature_paths() {
|
| 128 |
+
local repo_root=$(get_repo_root)
|
| 129 |
+
local current_branch=$(get_current_branch)
|
| 130 |
+
local has_git_repo="false"
|
| 131 |
+
|
| 132 |
+
if has_git; then
|
| 133 |
+
has_git_repo="true"
|
| 134 |
+
fi
|
| 135 |
+
|
| 136 |
+
# Use prefix-based lookup to support multiple branches per spec
|
| 137 |
+
local feature_dir=$(find_feature_dir_by_prefix "$repo_root" "$current_branch")
|
| 138 |
+
|
| 139 |
+
cat <<EOF
|
| 140 |
+
REPO_ROOT='$repo_root'
|
| 141 |
+
CURRENT_BRANCH='$current_branch'
|
| 142 |
+
HAS_GIT='$has_git_repo'
|
| 143 |
+
FEATURE_DIR='$feature_dir'
|
| 144 |
+
FEATURE_SPEC='$feature_dir/spec.md'
|
| 145 |
+
IMPL_PLAN='$feature_dir/plan.md'
|
| 146 |
+
TASKS='$feature_dir/tasks.md'
|
| 147 |
+
RESEARCH='$feature_dir/research.md'
|
| 148 |
+
DATA_MODEL='$feature_dir/data-model.md'
|
| 149 |
+
QUICKSTART='$feature_dir/quickstart.md'
|
| 150 |
+
CONTRACTS_DIR='$feature_dir/contracts'
|
| 151 |
+
EOF
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
check_file() { [[ -f "$1" ]] && echo " ✓ $2" || echo " ✗ $2"; }
|
| 155 |
+
check_dir() { [[ -d "$1" && -n $(ls -A "$1" 2>/dev/null) ]] && echo " ✓ $2" || echo " ✗ $2"; }
|
| 156 |
+
|
.specify/scripts/bash/create-new-feature.sh
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
|
| 3 |
+
set -e
|
| 4 |
+
|
| 5 |
+
JSON_MODE=false
|
| 6 |
+
SHORT_NAME=""
|
| 7 |
+
ARGS=()
|
| 8 |
+
i=1
|
| 9 |
+
while [ $i -le $# ]; do
|
| 10 |
+
arg="${!i}"
|
| 11 |
+
case "$arg" in
|
| 12 |
+
--json)
|
| 13 |
+
JSON_MODE=true
|
| 14 |
+
;;
|
| 15 |
+
--short-name)
|
| 16 |
+
if [ $((i + 1)) -gt $# ]; then
|
| 17 |
+
echo 'Error: --short-name requires a value' >&2
|
| 18 |
+
exit 1
|
| 19 |
+
fi
|
| 20 |
+
i=$((i + 1))
|
| 21 |
+
next_arg="${!i}"
|
| 22 |
+
# Check if the next argument is another option (starts with --)
|
| 23 |
+
if [[ "$next_arg" == --* ]]; then
|
| 24 |
+
echo 'Error: --short-name requires a value' >&2
|
| 25 |
+
exit 1
|
| 26 |
+
fi
|
| 27 |
+
SHORT_NAME="$next_arg"
|
| 28 |
+
;;
|
| 29 |
+
--help|-h)
|
| 30 |
+
echo "Usage: $0 [--json] [--short-name <name>] <feature_description>"
|
| 31 |
+
echo ""
|
| 32 |
+
echo "Options:"
|
| 33 |
+
echo " --json Output in JSON format"
|
| 34 |
+
echo " --short-name <name> Provide a custom short name (2-4 words) for the branch"
|
| 35 |
+
echo " --help, -h Show this help message"
|
| 36 |
+
echo ""
|
| 37 |
+
echo "Examples:"
|
| 38 |
+
echo " $0 'Add user authentication system' --short-name 'user-auth'"
|
| 39 |
+
echo " $0 'Implement OAuth2 integration for API'"
|
| 40 |
+
exit 0
|
| 41 |
+
;;
|
| 42 |
+
*)
|
| 43 |
+
ARGS+=("$arg")
|
| 44 |
+
;;
|
| 45 |
+
esac
|
| 46 |
+
i=$((i + 1))
|
| 47 |
+
done
|
| 48 |
+
|
| 49 |
+
FEATURE_DESCRIPTION="${ARGS[*]}"
|
| 50 |
+
if [ -z "$FEATURE_DESCRIPTION" ]; then
|
| 51 |
+
echo "Usage: $0 [--json] [--short-name <name>] <feature_description>" >&2
|
| 52 |
+
exit 1
|
| 53 |
+
fi
|
| 54 |
+
|
| 55 |
+
# Function to find the repository root by searching for existing project markers
|
| 56 |
+
find_repo_root() {
|
| 57 |
+
local dir="$1"
|
| 58 |
+
while [ "$dir" != "/" ]; do
|
| 59 |
+
if [ -d "$dir/.git" ] || [ -d "$dir/.specify" ]; then
|
| 60 |
+
echo "$dir"
|
| 61 |
+
return 0
|
| 62 |
+
fi
|
| 63 |
+
dir="$(dirname "$dir")"
|
| 64 |
+
done
|
| 65 |
+
return 1
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
# Resolve repository root. Prefer git information when available, but fall back
|
| 69 |
+
# to searching for repository markers so the workflow still functions in repositories that
|
| 70 |
+
# were initialised with --no-git.
|
| 71 |
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 72 |
+
|
| 73 |
+
if git rev-parse --show-toplevel >/dev/null 2>&1; then
|
| 74 |
+
REPO_ROOT=$(git rev-parse --show-toplevel)
|
| 75 |
+
HAS_GIT=true
|
| 76 |
+
else
|
| 77 |
+
REPO_ROOT="$(find_repo_root "$SCRIPT_DIR")"
|
| 78 |
+
if [ -z "$REPO_ROOT" ]; then
|
| 79 |
+
echo "Error: Could not determine repository root. Please run this script from within the repository." >&2
|
| 80 |
+
exit 1
|
| 81 |
+
fi
|
| 82 |
+
HAS_GIT=false
|
| 83 |
+
fi
|
| 84 |
+
|
| 85 |
+
cd "$REPO_ROOT"
|
| 86 |
+
|
| 87 |
+
SPECS_DIR="$REPO_ROOT/specs"
|
| 88 |
+
mkdir -p "$SPECS_DIR"
|
| 89 |
+
|
| 90 |
+
HIGHEST=0
|
| 91 |
+
if [ -d "$SPECS_DIR" ]; then
|
| 92 |
+
for dir in "$SPECS_DIR"/*; do
|
| 93 |
+
[ -d "$dir" ] || continue
|
| 94 |
+
dirname=$(basename "$dir")
|
| 95 |
+
number=$(echo "$dirname" | grep -o '^[0-9]\+' || echo "0")
|
| 96 |
+
number=$((10#$number))
|
| 97 |
+
if [ "$number" -gt "$HIGHEST" ]; then HIGHEST=$number; fi
|
| 98 |
+
done
|
| 99 |
+
fi
|
| 100 |
+
|
| 101 |
+
NEXT=$((HIGHEST + 1))
|
| 102 |
+
FEATURE_NUM=$(printf "%03d" "$NEXT")
|
| 103 |
+
|
| 104 |
+
# Function to generate branch name with stop word filtering and length filtering
|
| 105 |
+
generate_branch_name() {
|
| 106 |
+
local description="$1"
|
| 107 |
+
|
| 108 |
+
# Common stop words to filter out
|
| 109 |
+
local stop_words="^(i|a|an|the|to|for|of|in|on|at|by|with|from|is|are|was|were|be|been|being|have|has|had|do|does|did|will|would|should|could|can|may|might|must|shall|this|that|these|those|my|your|our|their|want|need|add|get|set)$"
|
| 110 |
+
|
| 111 |
+
# Convert to lowercase and split into words
|
| 112 |
+
local clean_name=$(echo "$description" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/ /g')
|
| 113 |
+
|
| 114 |
+
# Filter words: remove stop words and words shorter than 3 chars (unless they're uppercase acronyms in original)
|
| 115 |
+
local meaningful_words=()
|
| 116 |
+
for word in $clean_name; do
|
| 117 |
+
# Skip empty words
|
| 118 |
+
[ -z "$word" ] && continue
|
| 119 |
+
|
| 120 |
+
# Keep words that are NOT stop words AND (length >= 3 OR are potential acronyms)
|
| 121 |
+
if ! echo "$word" | grep -qiE "$stop_words"; then
|
| 122 |
+
if [ ${#word} -ge 3 ]; then
|
| 123 |
+
meaningful_words+=("$word")
|
| 124 |
+
elif echo "$description" | grep -q "\b${word^^}\b"; then
|
| 125 |
+
# Keep short words if they appear as uppercase in original (likely acronyms)
|
| 126 |
+
meaningful_words+=("$word")
|
| 127 |
+
fi
|
| 128 |
+
fi
|
| 129 |
+
done
|
| 130 |
+
|
| 131 |
+
# If we have meaningful words, use first 3-4 of them
|
| 132 |
+
if [ ${#meaningful_words[@]} -gt 0 ]; then
|
| 133 |
+
local max_words=3
|
| 134 |
+
if [ ${#meaningful_words[@]} -eq 4 ]; then max_words=4; fi
|
| 135 |
+
|
| 136 |
+
local result=""
|
| 137 |
+
local count=0
|
| 138 |
+
for word in "${meaningful_words[@]}"; do
|
| 139 |
+
if [ $count -ge $max_words ]; then break; fi
|
| 140 |
+
if [ -n "$result" ]; then result="$result-"; fi
|
| 141 |
+
result="$result$word"
|
| 142 |
+
count=$((count + 1))
|
| 143 |
+
done
|
| 144 |
+
echo "$result"
|
| 145 |
+
else
|
| 146 |
+
# Fallback to original logic if no meaningful words found
|
| 147 |
+
echo "$description" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\+/-/g' | sed 's/^-//' | sed 's/-$//' | tr '-' '\n' | grep -v '^$' | head -3 | tr '\n' '-' | sed 's/-$//'
|
| 148 |
+
fi
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
# Generate branch name
|
| 152 |
+
if [ -n "$SHORT_NAME" ]; then
|
| 153 |
+
# Use provided short name, just clean it up
|
| 154 |
+
BRANCH_SUFFIX=$(echo "$SHORT_NAME" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\+/-/g' | sed 's/^-//' | sed 's/-$//')
|
| 155 |
+
else
|
| 156 |
+
# Generate from description with smart filtering
|
| 157 |
+
BRANCH_SUFFIX=$(generate_branch_name "$FEATURE_DESCRIPTION")
|
| 158 |
+
fi
|
| 159 |
+
|
| 160 |
+
BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
|
| 161 |
+
|
| 162 |
+
# GitHub enforces a 244-byte limit on branch names
|
| 163 |
+
# Validate and truncate if necessary
|
| 164 |
+
MAX_BRANCH_LENGTH=244
|
| 165 |
+
if [ ${#BRANCH_NAME} -gt $MAX_BRANCH_LENGTH ]; then
|
| 166 |
+
# Calculate how much we need to trim from suffix
|
| 167 |
+
# Account for: feature number (3) + hyphen (1) = 4 chars
|
| 168 |
+
MAX_SUFFIX_LENGTH=$((MAX_BRANCH_LENGTH - 4))
|
| 169 |
+
|
| 170 |
+
# Truncate suffix at word boundary if possible
|
| 171 |
+
TRUNCATED_SUFFIX=$(echo "$BRANCH_SUFFIX" | cut -c1-$MAX_SUFFIX_LENGTH)
|
| 172 |
+
# Remove trailing hyphen if truncation created one
|
| 173 |
+
TRUNCATED_SUFFIX=$(echo "$TRUNCATED_SUFFIX" | sed 's/-$//')
|
| 174 |
+
|
| 175 |
+
ORIGINAL_BRANCH_NAME="$BRANCH_NAME"
|
| 176 |
+
BRANCH_NAME="${FEATURE_NUM}-${TRUNCATED_SUFFIX}"
|
| 177 |
+
|
| 178 |
+
>&2 echo "[specify] Warning: Branch name exceeded GitHub's 244-byte limit"
|
| 179 |
+
>&2 echo "[specify] Original: $ORIGINAL_BRANCH_NAME (${#ORIGINAL_BRANCH_NAME} bytes)"
|
| 180 |
+
>&2 echo "[specify] Truncated to: $BRANCH_NAME (${#BRANCH_NAME} bytes)"
|
| 181 |
+
fi
|
| 182 |
+
|
| 183 |
+
if [ "$HAS_GIT" = true ]; then
|
| 184 |
+
git checkout -b "$BRANCH_NAME"
|
| 185 |
+
else
|
| 186 |
+
>&2 echo "[specify] Warning: Git repository not detected; skipped branch creation for $BRANCH_NAME"
|
| 187 |
+
fi
|
| 188 |
+
|
| 189 |
+
FEATURE_DIR="$SPECS_DIR/$BRANCH_NAME"
|
| 190 |
+
mkdir -p "$FEATURE_DIR"
|
| 191 |
+
|
| 192 |
+
TEMPLATE="$REPO_ROOT/.specify/templates/spec-template.md"
|
| 193 |
+
SPEC_FILE="$FEATURE_DIR/spec.md"
|
| 194 |
+
if [ -f "$TEMPLATE" ]; then cp "$TEMPLATE" "$SPEC_FILE"; else touch "$SPEC_FILE"; fi
|
| 195 |
+
|
| 196 |
+
# Set the SPECIFY_FEATURE environment variable for the current session
|
| 197 |
+
export SPECIFY_FEATURE="$BRANCH_NAME"
|
| 198 |
+
|
| 199 |
+
if $JSON_MODE; then
|
| 200 |
+
printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s"}\n' "$BRANCH_NAME" "$SPEC_FILE" "$FEATURE_NUM"
|
| 201 |
+
else
|
| 202 |
+
echo "BRANCH_NAME: $BRANCH_NAME"
|
| 203 |
+
echo "SPEC_FILE: $SPEC_FILE"
|
| 204 |
+
echo "FEATURE_NUM: $FEATURE_NUM"
|
| 205 |
+
echo "SPECIFY_FEATURE environment variable set to: $BRANCH_NAME"
|
| 206 |
+
fi
|
.specify/scripts/bash/setup-plan.sh
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
|
| 3 |
+
set -e
|
| 4 |
+
|
| 5 |
+
# Parse command line arguments
|
| 6 |
+
JSON_MODE=false
|
| 7 |
+
ARGS=()
|
| 8 |
+
|
| 9 |
+
for arg in "$@"; do
|
| 10 |
+
case "$arg" in
|
| 11 |
+
--json)
|
| 12 |
+
JSON_MODE=true
|
| 13 |
+
;;
|
| 14 |
+
--help|-h)
|
| 15 |
+
echo "Usage: $0 [--json]"
|
| 16 |
+
echo " --json Output results in JSON format"
|
| 17 |
+
echo " --help Show this help message"
|
| 18 |
+
exit 0
|
| 19 |
+
;;
|
| 20 |
+
*)
|
| 21 |
+
ARGS+=("$arg")
|
| 22 |
+
;;
|
| 23 |
+
esac
|
| 24 |
+
done
|
| 25 |
+
|
| 26 |
+
# Get script directory and load common functions
|
| 27 |
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 28 |
+
source "$SCRIPT_DIR/common.sh"
|
| 29 |
+
|
| 30 |
+
# Get all paths and variables from common functions
|
| 31 |
+
eval $(get_feature_paths)
|
| 32 |
+
|
| 33 |
+
# Check if we're on a proper feature branch (only for git repos)
|
| 34 |
+
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
|
| 35 |
+
|
| 36 |
+
# Ensure the feature directory exists
|
| 37 |
+
mkdir -p "$FEATURE_DIR"
|
| 38 |
+
|
| 39 |
+
# Copy plan template if it exists
|
| 40 |
+
TEMPLATE="$REPO_ROOT/.specify/templates/plan-template.md"
|
| 41 |
+
if [[ -f "$TEMPLATE" ]]; then
|
| 42 |
+
cp "$TEMPLATE" "$IMPL_PLAN"
|
| 43 |
+
echo "Copied plan template to $IMPL_PLAN"
|
| 44 |
+
else
|
| 45 |
+
echo "Warning: Plan template not found at $TEMPLATE"
|
| 46 |
+
# Create a basic plan file if template doesn't exist
|
| 47 |
+
touch "$IMPL_PLAN"
|
| 48 |
+
fi
|
| 49 |
+
|
| 50 |
+
# Output results
|
| 51 |
+
if $JSON_MODE; then
|
| 52 |
+
printf '{"FEATURE_SPEC":"%s","IMPL_PLAN":"%s","SPECS_DIR":"%s","BRANCH":"%s","HAS_GIT":"%s"}\n' \
|
| 53 |
+
"$FEATURE_SPEC" "$IMPL_PLAN" "$FEATURE_DIR" "$CURRENT_BRANCH" "$HAS_GIT"
|
| 54 |
+
else
|
| 55 |
+
echo "FEATURE_SPEC: $FEATURE_SPEC"
|
| 56 |
+
echo "IMPL_PLAN: $IMPL_PLAN"
|
| 57 |
+
echo "SPECS_DIR: $FEATURE_DIR"
|
| 58 |
+
echo "BRANCH: $CURRENT_BRANCH"
|
| 59 |
+
echo "HAS_GIT: $HAS_GIT"
|
| 60 |
+
fi
|
| 61 |
+
|
.specify/scripts/bash/update-agent-context.sh
ADDED
|
@@ -0,0 +1,772 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
|
| 3 |
+
# Update agent context files with information from plan.md
|
| 4 |
+
#
|
| 5 |
+
# This script maintains AI agent context files by parsing feature specifications
|
| 6 |
+
# and updating agent-specific configuration files with project information.
|
| 7 |
+
#
|
| 8 |
+
# MAIN FUNCTIONS:
|
| 9 |
+
# 1. Environment Validation
|
| 10 |
+
# - Verifies git repository structure and branch information
|
| 11 |
+
# - Checks for required plan.md files and templates
|
| 12 |
+
# - Validates file permissions and accessibility
|
| 13 |
+
#
|
| 14 |
+
# 2. Plan Data Extraction
|
| 15 |
+
# - Parses plan.md files to extract project metadata
|
| 16 |
+
# - Identifies language/version, frameworks, databases, and project types
|
| 17 |
+
# - Handles missing or incomplete specification data gracefully
|
| 18 |
+
#
|
| 19 |
+
# 3. Agent File Management
|
| 20 |
+
# - Creates new agent context files from templates when needed
|
| 21 |
+
# - Updates existing agent files with new project information
|
| 22 |
+
# - Preserves manual additions and custom configurations
|
| 23 |
+
# - Supports multiple AI agent formats and directory structures
|
| 24 |
+
#
|
| 25 |
+
# 4. Content Generation
|
| 26 |
+
# - Generates language-specific build/test commands
|
| 27 |
+
# - Creates appropriate project directory structures
|
| 28 |
+
# - Updates technology stacks and recent changes sections
|
| 29 |
+
# - Maintains consistent formatting and timestamps
|
| 30 |
+
#
|
| 31 |
+
# 5. Multi-Agent Support
|
| 32 |
+
# - Handles agent-specific file paths and naming conventions
|
| 33 |
+
# - Supports: Claude, Gemini, Copilot, Cursor, Qwen, opencode, Codex, Windsurf, Kilo Code, Auggie CLI, Roo Code, CodeBuddy CLI, Amp, or Amazon Q Developer CLI
|
| 34 |
+
# - Can update single agents or all existing agent files
|
| 35 |
+
# - Creates default Claude file if no agent files exist
|
| 36 |
+
#
|
| 37 |
+
# Usage: ./update-agent-context.sh [agent_type]
|
| 38 |
+
# Agent types: claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|q
|
| 39 |
+
# Leave empty to update all existing agent files
|
| 40 |
+
|
| 41 |
+
set -e
|
| 42 |
+
|
| 43 |
+
# Enable strict error handling
|
| 44 |
+
set -u
|
| 45 |
+
set -o pipefail
|
| 46 |
+
|
| 47 |
+
#==============================================================================
|
| 48 |
+
# Configuration and Global Variables
|
| 49 |
+
#==============================================================================
|
| 50 |
+
|
| 51 |
+
# Get script directory and load common functions
|
| 52 |
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
| 53 |
+
source "$SCRIPT_DIR/common.sh"
|
| 54 |
+
|
| 55 |
+
# Get all paths and variables from common functions
|
| 56 |
+
eval $(get_feature_paths)
|
| 57 |
+
|
| 58 |
+
NEW_PLAN="$IMPL_PLAN" # Alias for compatibility with existing code
|
| 59 |
+
AGENT_TYPE="${1:-}"
|
| 60 |
+
|
| 61 |
+
# Agent-specific file paths
|
| 62 |
+
CLAUDE_FILE="$REPO_ROOT/CLAUDE.md"
|
| 63 |
+
GEMINI_FILE="$REPO_ROOT/GEMINI.md"
|
| 64 |
+
COPILOT_FILE="$REPO_ROOT/.github/copilot-instructions.md"
|
| 65 |
+
CURSOR_FILE="$REPO_ROOT/.cursor/rules/specify-rules.mdc"
|
| 66 |
+
QWEN_FILE="$REPO_ROOT/QWEN.md"
|
| 67 |
+
AGENTS_FILE="$REPO_ROOT/AGENTS.md"
|
| 68 |
+
WINDSURF_FILE="$REPO_ROOT/.windsurf/rules/specify-rules.md"
|
| 69 |
+
KILOCODE_FILE="$REPO_ROOT/.kilocode/rules/specify-rules.md"
|
| 70 |
+
AUGGIE_FILE="$REPO_ROOT/.augment/rules/specify-rules.md"
|
| 71 |
+
ROO_FILE="$REPO_ROOT/.roo/rules/specify-rules.md"
|
| 72 |
+
CODEBUDDY_FILE="$REPO_ROOT/CODEBUDDY.md"
|
| 73 |
+
AMP_FILE="$REPO_ROOT/AGENTS.md"
|
| 74 |
+
Q_FILE="$REPO_ROOT/AGENTS.md"
|
| 75 |
+
|
| 76 |
+
# Template file
|
| 77 |
+
TEMPLATE_FILE="$REPO_ROOT/.specify/templates/agent-file-template.md"
|
| 78 |
+
|
| 79 |
+
# Global variables for parsed plan data
|
| 80 |
+
NEW_LANG=""
|
| 81 |
+
NEW_FRAMEWORK=""
|
| 82 |
+
NEW_DB=""
|
| 83 |
+
NEW_PROJECT_TYPE=""
|
| 84 |
+
|
| 85 |
+
#==============================================================================
|
| 86 |
+
# Utility Functions
|
| 87 |
+
#==============================================================================
|
| 88 |
+
|
| 89 |
+
log_info() {
|
| 90 |
+
echo "INFO: $1"
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
log_success() {
|
| 94 |
+
echo "✓ $1"
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
log_error() {
|
| 98 |
+
echo "ERROR: $1" >&2
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
log_warning() {
|
| 102 |
+
echo "WARNING: $1" >&2
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
# Cleanup function for temporary files
|
| 106 |
+
cleanup() {
|
| 107 |
+
local exit_code=$?
|
| 108 |
+
rm -f /tmp/agent_update_*_$$
|
| 109 |
+
rm -f /tmp/manual_additions_$$
|
| 110 |
+
exit $exit_code
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
# Set up cleanup trap
|
| 114 |
+
trap cleanup EXIT INT TERM
|
| 115 |
+
|
| 116 |
+
#==============================================================================
|
| 117 |
+
# Validation Functions
|
| 118 |
+
#==============================================================================
|
| 119 |
+
|
| 120 |
+
validate_environment() {
|
| 121 |
+
# Check if we have a current branch/feature (git or non-git)
|
| 122 |
+
if [[ -z "$CURRENT_BRANCH" ]]; then
|
| 123 |
+
log_error "Unable to determine current feature"
|
| 124 |
+
if [[ "$HAS_GIT" == "true" ]]; then
|
| 125 |
+
log_info "Make sure you're on a feature branch"
|
| 126 |
+
else
|
| 127 |
+
log_info "Set SPECIFY_FEATURE environment variable or create a feature first"
|
| 128 |
+
fi
|
| 129 |
+
exit 1
|
| 130 |
+
fi
|
| 131 |
+
|
| 132 |
+
# Check if plan.md exists
|
| 133 |
+
if [[ ! -f "$NEW_PLAN" ]]; then
|
| 134 |
+
log_error "No plan.md found at $NEW_PLAN"
|
| 135 |
+
log_info "Make sure you're working on a feature with a corresponding spec directory"
|
| 136 |
+
if [[ "$HAS_GIT" != "true" ]]; then
|
| 137 |
+
log_info "Use: export SPECIFY_FEATURE=your-feature-name or create a new feature first"
|
| 138 |
+
fi
|
| 139 |
+
exit 1
|
| 140 |
+
fi
|
| 141 |
+
|
| 142 |
+
# Check if template exists (needed for new files)
|
| 143 |
+
if [[ ! -f "$TEMPLATE_FILE" ]]; then
|
| 144 |
+
log_warning "Template file not found at $TEMPLATE_FILE"
|
| 145 |
+
log_warning "Creating new agent files will fail"
|
| 146 |
+
fi
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
#==============================================================================
|
| 150 |
+
# Plan Parsing Functions
|
| 151 |
+
#==============================================================================
|
| 152 |
+
|
| 153 |
+
extract_plan_field() {
|
| 154 |
+
local field_pattern="$1"
|
| 155 |
+
local plan_file="$2"
|
| 156 |
+
|
| 157 |
+
grep "^\*\*${field_pattern}\*\*: " "$plan_file" 2>/dev/null | \
|
| 158 |
+
head -1 | \
|
| 159 |
+
sed "s|^\*\*${field_pattern}\*\*: ||" | \
|
| 160 |
+
sed 's/^[ \t]*//;s/[ \t]*$//' | \
|
| 161 |
+
grep -v "NEEDS CLARIFICATION" | \
|
| 162 |
+
grep -v "^N/A$" || echo ""
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
parse_plan_data() {
|
| 166 |
+
local plan_file="$1"
|
| 167 |
+
|
| 168 |
+
if [[ ! -f "$plan_file" ]]; then
|
| 169 |
+
log_error "Plan file not found: $plan_file"
|
| 170 |
+
return 1
|
| 171 |
+
fi
|
| 172 |
+
|
| 173 |
+
if [[ ! -r "$plan_file" ]]; then
|
| 174 |
+
log_error "Plan file is not readable: $plan_file"
|
| 175 |
+
return 1
|
| 176 |
+
fi
|
| 177 |
+
|
| 178 |
+
log_info "Parsing plan data from $plan_file"
|
| 179 |
+
|
| 180 |
+
NEW_LANG=$(extract_plan_field "Language/Version" "$plan_file")
|
| 181 |
+
NEW_FRAMEWORK=$(extract_plan_field "Primary Dependencies" "$plan_file")
|
| 182 |
+
NEW_DB=$(extract_plan_field "Storage" "$plan_file")
|
| 183 |
+
NEW_PROJECT_TYPE=$(extract_plan_field "Project Type" "$plan_file")
|
| 184 |
+
|
| 185 |
+
# Log what we found
|
| 186 |
+
if [[ -n "$NEW_LANG" ]]; then
|
| 187 |
+
log_info "Found language: $NEW_LANG"
|
| 188 |
+
else
|
| 189 |
+
log_warning "No language information found in plan"
|
| 190 |
+
fi
|
| 191 |
+
|
| 192 |
+
if [[ -n "$NEW_FRAMEWORK" ]]; then
|
| 193 |
+
log_info "Found framework: $NEW_FRAMEWORK"
|
| 194 |
+
fi
|
| 195 |
+
|
| 196 |
+
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]]; then
|
| 197 |
+
log_info "Found database: $NEW_DB"
|
| 198 |
+
fi
|
| 199 |
+
|
| 200 |
+
if [[ -n "$NEW_PROJECT_TYPE" ]]; then
|
| 201 |
+
log_info "Found project type: $NEW_PROJECT_TYPE"
|
| 202 |
+
fi
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
format_technology_stack() {
|
| 206 |
+
local lang="$1"
|
| 207 |
+
local framework="$2"
|
| 208 |
+
local parts=()
|
| 209 |
+
|
| 210 |
+
# Add non-empty parts
|
| 211 |
+
[[ -n "$lang" && "$lang" != "NEEDS CLARIFICATION" ]] && parts+=("$lang")
|
| 212 |
+
[[ -n "$framework" && "$framework" != "NEEDS CLARIFICATION" && "$framework" != "N/A" ]] && parts+=("$framework")
|
| 213 |
+
|
| 214 |
+
# Join with proper formatting
|
| 215 |
+
if [[ ${#parts[@]} -eq 0 ]]; then
|
| 216 |
+
echo ""
|
| 217 |
+
elif [[ ${#parts[@]} -eq 1 ]]; then
|
| 218 |
+
echo "${parts[0]}"
|
| 219 |
+
else
|
| 220 |
+
# Join multiple parts with " + "
|
| 221 |
+
local result="${parts[0]}"
|
| 222 |
+
for ((i=1; i<${#parts[@]}; i++)); do
|
| 223 |
+
result="$result + ${parts[i]}"
|
| 224 |
+
done
|
| 225 |
+
echo "$result"
|
| 226 |
+
fi
|
| 227 |
+
}
|
| 228 |
+
|
| 229 |
+
#==============================================================================
|
| 230 |
+
# Template and Content Generation Functions
|
| 231 |
+
#==============================================================================
|
| 232 |
+
|
| 233 |
+
get_project_structure() {
|
| 234 |
+
local project_type="$1"
|
| 235 |
+
|
| 236 |
+
if [[ "$project_type" == *"web"* ]]; then
|
| 237 |
+
echo "backend/\\nfrontend/\\ntests/"
|
| 238 |
+
else
|
| 239 |
+
echo "src/\\ntests/"
|
| 240 |
+
fi
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
get_commands_for_language() {
|
| 244 |
+
local lang="$1"
|
| 245 |
+
|
| 246 |
+
case "$lang" in
|
| 247 |
+
*"Python"*)
|
| 248 |
+
echo "cd src && pytest && ruff check ."
|
| 249 |
+
;;
|
| 250 |
+
*"Rust"*)
|
| 251 |
+
echo "cargo test && cargo clippy"
|
| 252 |
+
;;
|
| 253 |
+
*"JavaScript"*|*"TypeScript"*)
|
| 254 |
+
echo "npm test \\&\\& npm run lint"
|
| 255 |
+
;;
|
| 256 |
+
*)
|
| 257 |
+
echo "# Add commands for $lang"
|
| 258 |
+
;;
|
| 259 |
+
esac
|
| 260 |
+
}
|
| 261 |
+
|
| 262 |
+
get_language_conventions() {
|
| 263 |
+
local lang="$1"
|
| 264 |
+
echo "$lang: Follow standard conventions"
|
| 265 |
+
}
|
| 266 |
+
|
| 267 |
+
create_new_agent_file() {
|
| 268 |
+
local target_file="$1"
|
| 269 |
+
local temp_file="$2"
|
| 270 |
+
local project_name="$3"
|
| 271 |
+
local current_date="$4"
|
| 272 |
+
|
| 273 |
+
if [[ ! -f "$TEMPLATE_FILE" ]]; then
|
| 274 |
+
log_error "Template not found at $TEMPLATE_FILE"
|
| 275 |
+
return 1
|
| 276 |
+
fi
|
| 277 |
+
|
| 278 |
+
if [[ ! -r "$TEMPLATE_FILE" ]]; then
|
| 279 |
+
log_error "Template file is not readable: $TEMPLATE_FILE"
|
| 280 |
+
return 1
|
| 281 |
+
fi
|
| 282 |
+
|
| 283 |
+
log_info "Creating new agent context file from template..."
|
| 284 |
+
|
| 285 |
+
if ! cp "$TEMPLATE_FILE" "$temp_file"; then
|
| 286 |
+
log_error "Failed to copy template file"
|
| 287 |
+
return 1
|
| 288 |
+
fi
|
| 289 |
+
|
| 290 |
+
# Replace template placeholders
|
| 291 |
+
local project_structure
|
| 292 |
+
project_structure=$(get_project_structure "$NEW_PROJECT_TYPE")
|
| 293 |
+
|
| 294 |
+
local commands
|
| 295 |
+
commands=$(get_commands_for_language "$NEW_LANG")
|
| 296 |
+
|
| 297 |
+
local language_conventions
|
| 298 |
+
language_conventions=$(get_language_conventions "$NEW_LANG")
|
| 299 |
+
|
| 300 |
+
# Perform substitutions with error checking using safer approach
|
| 301 |
+
# Escape special characters for sed by using a different delimiter or escaping
|
| 302 |
+
local escaped_lang=$(printf '%s\n' "$NEW_LANG" | sed 's/[\[\.*^$()+{}|]/\\&/g')
|
| 303 |
+
local escaped_framework=$(printf '%s\n' "$NEW_FRAMEWORK" | sed 's/[\[\.*^$()+{}|]/\\&/g')
|
| 304 |
+
local escaped_branch=$(printf '%s\n' "$CURRENT_BRANCH" | sed 's/[\[\.*^$()+{}|]/\\&/g')
|
| 305 |
+
|
| 306 |
+
# Build technology stack and recent change strings conditionally
|
| 307 |
+
local tech_stack
|
| 308 |
+
if [[ -n "$escaped_lang" && -n "$escaped_framework" ]]; then
|
| 309 |
+
tech_stack="- $escaped_lang + $escaped_framework ($escaped_branch)"
|
| 310 |
+
elif [[ -n "$escaped_lang" ]]; then
|
| 311 |
+
tech_stack="- $escaped_lang ($escaped_branch)"
|
| 312 |
+
elif [[ -n "$escaped_framework" ]]; then
|
| 313 |
+
tech_stack="- $escaped_framework ($escaped_branch)"
|
| 314 |
+
else
|
| 315 |
+
tech_stack="- ($escaped_branch)"
|
| 316 |
+
fi
|
| 317 |
+
|
| 318 |
+
local recent_change
|
| 319 |
+
if [[ -n "$escaped_lang" && -n "$escaped_framework" ]]; then
|
| 320 |
+
recent_change="- $escaped_branch: Added $escaped_lang + $escaped_framework"
|
| 321 |
+
elif [[ -n "$escaped_lang" ]]; then
|
| 322 |
+
recent_change="- $escaped_branch: Added $escaped_lang"
|
| 323 |
+
elif [[ -n "$escaped_framework" ]]; then
|
| 324 |
+
recent_change="- $escaped_branch: Added $escaped_framework"
|
| 325 |
+
else
|
| 326 |
+
recent_change="- $escaped_branch: Added"
|
| 327 |
+
fi
|
| 328 |
+
|
| 329 |
+
local substitutions=(
|
| 330 |
+
"s|\[PROJECT NAME\]|$project_name|"
|
| 331 |
+
"s|\[DATE\]|$current_date|"
|
| 332 |
+
"s|\[EXTRACTED FROM ALL PLAN.MD FILES\]|$tech_stack|"
|
| 333 |
+
"s|\[ACTUAL STRUCTURE FROM PLANS\]|$project_structure|g"
|
| 334 |
+
"s|\[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES\]|$commands|"
|
| 335 |
+
"s|\[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE\]|$language_conventions|"
|
| 336 |
+
"s|\[LAST 3 FEATURES AND WHAT THEY ADDED\]|$recent_change|"
|
| 337 |
+
)
|
| 338 |
+
|
| 339 |
+
for substitution in "${substitutions[@]}"; do
|
| 340 |
+
if ! sed -i.bak -e "$substitution" "$temp_file"; then
|
| 341 |
+
log_error "Failed to perform substitution: $substitution"
|
| 342 |
+
rm -f "$temp_file" "$temp_file.bak"
|
| 343 |
+
return 1
|
| 344 |
+
fi
|
| 345 |
+
done
|
| 346 |
+
|
| 347 |
+
# Convert \n sequences to actual newlines
|
| 348 |
+
newline=$(printf '\n')
|
| 349 |
+
sed -i.bak2 "s/\\\\n/${newline}/g" "$temp_file"
|
| 350 |
+
|
| 351 |
+
# Clean up backup files
|
| 352 |
+
rm -f "$temp_file.bak" "$temp_file.bak2"
|
| 353 |
+
|
| 354 |
+
return 0
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
|
| 360 |
+
update_existing_agent_file() {
|
| 361 |
+
local target_file="$1"
|
| 362 |
+
local current_date="$2"
|
| 363 |
+
|
| 364 |
+
log_info "Updating existing agent context file..."
|
| 365 |
+
|
| 366 |
+
# Use a single temporary file for atomic update
|
| 367 |
+
local temp_file
|
| 368 |
+
temp_file=$(mktemp) || {
|
| 369 |
+
log_error "Failed to create temporary file"
|
| 370 |
+
return 1
|
| 371 |
+
}
|
| 372 |
+
|
| 373 |
+
# Process the file in one pass
|
| 374 |
+
local tech_stack=$(format_technology_stack "$NEW_LANG" "$NEW_FRAMEWORK")
|
| 375 |
+
local new_tech_entries=()
|
| 376 |
+
local new_change_entry=""
|
| 377 |
+
|
| 378 |
+
# Prepare new technology entries
|
| 379 |
+
if [[ -n "$tech_stack" ]] && ! grep -q "$tech_stack" "$target_file"; then
|
| 380 |
+
new_tech_entries+=("- $tech_stack ($CURRENT_BRANCH)")
|
| 381 |
+
fi
|
| 382 |
+
|
| 383 |
+
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]] && ! grep -q "$NEW_DB" "$target_file"; then
|
| 384 |
+
new_tech_entries+=("- $NEW_DB ($CURRENT_BRANCH)")
|
| 385 |
+
fi
|
| 386 |
+
|
| 387 |
+
# Prepare new change entry
|
| 388 |
+
if [[ -n "$tech_stack" ]]; then
|
| 389 |
+
new_change_entry="- $CURRENT_BRANCH: Added $tech_stack"
|
| 390 |
+
elif [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]]; then
|
| 391 |
+
new_change_entry="- $CURRENT_BRANCH: Added $NEW_DB"
|
| 392 |
+
fi
|
| 393 |
+
|
| 394 |
+
# Check if sections exist in the file
|
| 395 |
+
local has_active_technologies=0
|
| 396 |
+
local has_recent_changes=0
|
| 397 |
+
|
| 398 |
+
if grep -q "^## Active Technologies" "$target_file" 2>/dev/null; then
|
| 399 |
+
has_active_technologies=1
|
| 400 |
+
fi
|
| 401 |
+
|
| 402 |
+
if grep -q "^## Recent Changes" "$target_file" 2>/dev/null; then
|
| 403 |
+
has_recent_changes=1
|
| 404 |
+
fi
|
| 405 |
+
|
| 406 |
+
# Process file line by line
|
| 407 |
+
local in_tech_section=false
|
| 408 |
+
local in_changes_section=false
|
| 409 |
+
local tech_entries_added=false
|
| 410 |
+
local changes_entries_added=false
|
| 411 |
+
local existing_changes_count=0
|
| 412 |
+
local file_ended=false
|
| 413 |
+
|
| 414 |
+
while IFS= read -r line || [[ -n "$line" ]]; do
|
| 415 |
+
# Handle Active Technologies section
|
| 416 |
+
if [[ "$line" == "## Active Technologies" ]]; then
|
| 417 |
+
echo "$line" >> "$temp_file"
|
| 418 |
+
in_tech_section=true
|
| 419 |
+
continue
|
| 420 |
+
elif [[ $in_tech_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then
|
| 421 |
+
# Add new tech entries before closing the section
|
| 422 |
+
if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
|
| 423 |
+
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
|
| 424 |
+
tech_entries_added=true
|
| 425 |
+
fi
|
| 426 |
+
echo "$line" >> "$temp_file"
|
| 427 |
+
in_tech_section=false
|
| 428 |
+
continue
|
| 429 |
+
elif [[ $in_tech_section == true ]] && [[ -z "$line" ]]; then
|
| 430 |
+
# Add new tech entries before empty line in tech section
|
| 431 |
+
if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
|
| 432 |
+
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
|
| 433 |
+
tech_entries_added=true
|
| 434 |
+
fi
|
| 435 |
+
echo "$line" >> "$temp_file"
|
| 436 |
+
continue
|
| 437 |
+
fi
|
| 438 |
+
|
| 439 |
+
# Handle Recent Changes section
|
| 440 |
+
if [[ "$line" == "## Recent Changes" ]]; then
|
| 441 |
+
echo "$line" >> "$temp_file"
|
| 442 |
+
# Add new change entry right after the heading
|
| 443 |
+
if [[ -n "$new_change_entry" ]]; then
|
| 444 |
+
echo "$new_change_entry" >> "$temp_file"
|
| 445 |
+
fi
|
| 446 |
+
in_changes_section=true
|
| 447 |
+
changes_entries_added=true
|
| 448 |
+
continue
|
| 449 |
+
elif [[ $in_changes_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then
|
| 450 |
+
echo "$line" >> "$temp_file"
|
| 451 |
+
in_changes_section=false
|
| 452 |
+
continue
|
| 453 |
+
elif [[ $in_changes_section == true ]] && [[ "$line" == "- "* ]]; then
|
| 454 |
+
# Keep only first 2 existing changes
|
| 455 |
+
if [[ $existing_changes_count -lt 2 ]]; then
|
| 456 |
+
echo "$line" >> "$temp_file"
|
| 457 |
+
((existing_changes_count++))
|
| 458 |
+
fi
|
| 459 |
+
continue
|
| 460 |
+
fi
|
| 461 |
+
|
| 462 |
+
# Update timestamp
|
| 463 |
+
if [[ "$line" =~ \*\*Last\ updated\*\*:.*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] ]]; then
|
| 464 |
+
echo "$line" | sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/$current_date/" >> "$temp_file"
|
| 465 |
+
else
|
| 466 |
+
echo "$line" >> "$temp_file"
|
| 467 |
+
fi
|
| 468 |
+
done < "$target_file"
|
| 469 |
+
|
| 470 |
+
# Post-loop check: if we're still in the Active Technologies section and haven't added new entries
|
| 471 |
+
if [[ $in_tech_section == true ]] && [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
|
| 472 |
+
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
|
| 473 |
+
tech_entries_added=true
|
| 474 |
+
fi
|
| 475 |
+
|
| 476 |
+
# If sections don't exist, add them at the end of the file
|
| 477 |
+
if [[ $has_active_technologies -eq 0 ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then
|
| 478 |
+
echo "" >> "$temp_file"
|
| 479 |
+
echo "## Active Technologies" >> "$temp_file"
|
| 480 |
+
printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file"
|
| 481 |
+
tech_entries_added=true
|
| 482 |
+
fi
|
| 483 |
+
|
| 484 |
+
if [[ $has_recent_changes -eq 0 ]] && [[ -n "$new_change_entry" ]]; then
|
| 485 |
+
echo "" >> "$temp_file"
|
| 486 |
+
echo "## Recent Changes" >> "$temp_file"
|
| 487 |
+
echo "$new_change_entry" >> "$temp_file"
|
| 488 |
+
changes_entries_added=true
|
| 489 |
+
fi
|
| 490 |
+
|
| 491 |
+
# Move temp file to target atomically
|
| 492 |
+
if ! mv "$temp_file" "$target_file"; then
|
| 493 |
+
log_error "Failed to update target file"
|
| 494 |
+
rm -f "$temp_file"
|
| 495 |
+
return 1
|
| 496 |
+
fi
|
| 497 |
+
|
| 498 |
+
return 0
|
| 499 |
+
}
|
| 500 |
+
#==============================================================================
|
| 501 |
+
# Main Agent File Update Function
|
| 502 |
+
#==============================================================================
|
| 503 |
+
|
| 504 |
+
update_agent_file() {
|
| 505 |
+
local target_file="$1"
|
| 506 |
+
local agent_name="$2"
|
| 507 |
+
|
| 508 |
+
if [[ -z "$target_file" ]] || [[ -z "$agent_name" ]]; then
|
| 509 |
+
log_error "update_agent_file requires target_file and agent_name parameters"
|
| 510 |
+
return 1
|
| 511 |
+
fi
|
| 512 |
+
|
| 513 |
+
log_info "Updating $agent_name context file: $target_file"
|
| 514 |
+
|
| 515 |
+
local project_name
|
| 516 |
+
project_name=$(basename "$REPO_ROOT")
|
| 517 |
+
local current_date
|
| 518 |
+
current_date=$(date +%Y-%m-%d)
|
| 519 |
+
|
| 520 |
+
# Create directory if it doesn't exist
|
| 521 |
+
local target_dir
|
| 522 |
+
target_dir=$(dirname "$target_file")
|
| 523 |
+
if [[ ! -d "$target_dir" ]]; then
|
| 524 |
+
if ! mkdir -p "$target_dir"; then
|
| 525 |
+
log_error "Failed to create directory: $target_dir"
|
| 526 |
+
return 1
|
| 527 |
+
fi
|
| 528 |
+
fi
|
| 529 |
+
|
| 530 |
+
if [[ ! -f "$target_file" ]]; then
|
| 531 |
+
# Create new file from template
|
| 532 |
+
local temp_file
|
| 533 |
+
temp_file=$(mktemp) || {
|
| 534 |
+
log_error "Failed to create temporary file"
|
| 535 |
+
return 1
|
| 536 |
+
}
|
| 537 |
+
|
| 538 |
+
if create_new_agent_file "$target_file" "$temp_file" "$project_name" "$current_date"; then
|
| 539 |
+
if mv "$temp_file" "$target_file"; then
|
| 540 |
+
log_success "Created new $agent_name context file"
|
| 541 |
+
else
|
| 542 |
+
log_error "Failed to move temporary file to $target_file"
|
| 543 |
+
rm -f "$temp_file"
|
| 544 |
+
return 1
|
| 545 |
+
fi
|
| 546 |
+
else
|
| 547 |
+
log_error "Failed to create new agent file"
|
| 548 |
+
rm -f "$temp_file"
|
| 549 |
+
return 1
|
| 550 |
+
fi
|
| 551 |
+
else
|
| 552 |
+
# Update existing file
|
| 553 |
+
if [[ ! -r "$target_file" ]]; then
|
| 554 |
+
log_error "Cannot read existing file: $target_file"
|
| 555 |
+
return 1
|
| 556 |
+
fi
|
| 557 |
+
|
| 558 |
+
if [[ ! -w "$target_file" ]]; then
|
| 559 |
+
log_error "Cannot write to existing file: $target_file"
|
| 560 |
+
return 1
|
| 561 |
+
fi
|
| 562 |
+
|
| 563 |
+
if update_existing_agent_file "$target_file" "$current_date"; then
|
| 564 |
+
log_success "Updated existing $agent_name context file"
|
| 565 |
+
else
|
| 566 |
+
log_error "Failed to update existing agent file"
|
| 567 |
+
return 1
|
| 568 |
+
fi
|
| 569 |
+
fi
|
| 570 |
+
|
| 571 |
+
return 0
|
| 572 |
+
}
|
| 573 |
+
|
| 574 |
+
#==============================================================================
|
| 575 |
+
# Agent Selection and Processing
|
| 576 |
+
#==============================================================================
|
| 577 |
+
|
| 578 |
+
update_specific_agent() {
|
| 579 |
+
local agent_type="$1"
|
| 580 |
+
|
| 581 |
+
case "$agent_type" in
|
| 582 |
+
claude)
|
| 583 |
+
update_agent_file "$CLAUDE_FILE" "Claude Code"
|
| 584 |
+
;;
|
| 585 |
+
gemini)
|
| 586 |
+
update_agent_file "$GEMINI_FILE" "Gemini CLI"
|
| 587 |
+
;;
|
| 588 |
+
copilot)
|
| 589 |
+
update_agent_file "$COPILOT_FILE" "GitHub Copilot"
|
| 590 |
+
;;
|
| 591 |
+
cursor-agent)
|
| 592 |
+
update_agent_file "$CURSOR_FILE" "Cursor IDE"
|
| 593 |
+
;;
|
| 594 |
+
qwen)
|
| 595 |
+
update_agent_file "$QWEN_FILE" "Qwen Code"
|
| 596 |
+
;;
|
| 597 |
+
opencode)
|
| 598 |
+
update_agent_file "$AGENTS_FILE" "opencode"
|
| 599 |
+
;;
|
| 600 |
+
codex)
|
| 601 |
+
update_agent_file "$AGENTS_FILE" "Codex CLI"
|
| 602 |
+
;;
|
| 603 |
+
windsurf)
|
| 604 |
+
update_agent_file "$WINDSURF_FILE" "Windsurf"
|
| 605 |
+
;;
|
| 606 |
+
kilocode)
|
| 607 |
+
update_agent_file "$KILOCODE_FILE" "Kilo Code"
|
| 608 |
+
;;
|
| 609 |
+
auggie)
|
| 610 |
+
update_agent_file "$AUGGIE_FILE" "Auggie CLI"
|
| 611 |
+
;;
|
| 612 |
+
roo)
|
| 613 |
+
update_agent_file "$ROO_FILE" "Roo Code"
|
| 614 |
+
;;
|
| 615 |
+
codebuddy)
|
| 616 |
+
update_agent_file "$CODEBUDDY_FILE" "CodeBuddy CLI"
|
| 617 |
+
;;
|
| 618 |
+
amp)
|
| 619 |
+
update_agent_file "$AMP_FILE" "Amp"
|
| 620 |
+
;;
|
| 621 |
+
q)
|
| 622 |
+
update_agent_file "$Q_FILE" "Amazon Q Developer CLI"
|
| 623 |
+
;;
|
| 624 |
+
*)
|
| 625 |
+
log_error "Unknown agent type '$agent_type'"
|
| 626 |
+
log_error "Expected: claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|roo|amp|q"
|
| 627 |
+
exit 1
|
| 628 |
+
;;
|
| 629 |
+
esac
|
| 630 |
+
}
|
| 631 |
+
|
| 632 |
+
update_all_existing_agents() {
|
| 633 |
+
local found_agent=false
|
| 634 |
+
|
| 635 |
+
# Check each possible agent file and update if it exists
|
| 636 |
+
if [[ -f "$CLAUDE_FILE" ]]; then
|
| 637 |
+
update_agent_file "$CLAUDE_FILE" "Claude Code"
|
| 638 |
+
found_agent=true
|
| 639 |
+
fi
|
| 640 |
+
|
| 641 |
+
if [[ -f "$GEMINI_FILE" ]]; then
|
| 642 |
+
update_agent_file "$GEMINI_FILE" "Gemini CLI"
|
| 643 |
+
found_agent=true
|
| 644 |
+
fi
|
| 645 |
+
|
| 646 |
+
if [[ -f "$COPILOT_FILE" ]]; then
|
| 647 |
+
update_agent_file "$COPILOT_FILE" "GitHub Copilot"
|
| 648 |
+
found_agent=true
|
| 649 |
+
fi
|
| 650 |
+
|
| 651 |
+
if [[ -f "$CURSOR_FILE" ]]; then
|
| 652 |
+
update_agent_file "$CURSOR_FILE" "Cursor IDE"
|
| 653 |
+
found_agent=true
|
| 654 |
+
fi
|
| 655 |
+
|
| 656 |
+
if [[ -f "$QWEN_FILE" ]]; then
|
| 657 |
+
update_agent_file "$QWEN_FILE" "Qwen Code"
|
| 658 |
+
found_agent=true
|
| 659 |
+
fi
|
| 660 |
+
|
| 661 |
+
if [[ -f "$AGENTS_FILE" ]]; then
|
| 662 |
+
update_agent_file "$AGENTS_FILE" "Codex/opencode"
|
| 663 |
+
found_agent=true
|
| 664 |
+
fi
|
| 665 |
+
|
| 666 |
+
if [[ -f "$WINDSURF_FILE" ]]; then
|
| 667 |
+
update_agent_file "$WINDSURF_FILE" "Windsurf"
|
| 668 |
+
found_agent=true
|
| 669 |
+
fi
|
| 670 |
+
|
| 671 |
+
if [[ -f "$KILOCODE_FILE" ]]; then
|
| 672 |
+
update_agent_file "$KILOCODE_FILE" "Kilo Code"
|
| 673 |
+
found_agent=true
|
| 674 |
+
fi
|
| 675 |
+
|
| 676 |
+
if [[ -f "$AUGGIE_FILE" ]]; then
|
| 677 |
+
update_agent_file "$AUGGIE_FILE" "Auggie CLI"
|
| 678 |
+
found_agent=true
|
| 679 |
+
fi
|
| 680 |
+
|
| 681 |
+
if [[ -f "$ROO_FILE" ]]; then
|
| 682 |
+
update_agent_file "$ROO_FILE" "Roo Code"
|
| 683 |
+
found_agent=true
|
| 684 |
+
fi
|
| 685 |
+
|
| 686 |
+
if [[ -f "$CODEBUDDY_FILE" ]]; then
|
| 687 |
+
update_agent_file "$CODEBUDDY_FILE" "CodeBuddy CLI"
|
| 688 |
+
found_agent=true
|
| 689 |
+
fi
|
| 690 |
+
|
| 691 |
+
if [[ -f "$Q_FILE" ]]; then
|
| 692 |
+
update_agent_file "$Q_FILE" "Amazon Q Developer CLI"
|
| 693 |
+
found_agent=true
|
| 694 |
+
fi
|
| 695 |
+
|
| 696 |
+
# If no agent files exist, create a default Claude file
|
| 697 |
+
if [[ "$found_agent" == false ]]; then
|
| 698 |
+
log_info "No existing agent files found, creating default Claude file..."
|
| 699 |
+
update_agent_file "$CLAUDE_FILE" "Claude Code"
|
| 700 |
+
fi
|
| 701 |
+
}
|
| 702 |
+
print_summary() {
|
| 703 |
+
echo
|
| 704 |
+
log_info "Summary of changes:"
|
| 705 |
+
|
| 706 |
+
if [[ -n "$NEW_LANG" ]]; then
|
| 707 |
+
echo " - Added language: $NEW_LANG"
|
| 708 |
+
fi
|
| 709 |
+
|
| 710 |
+
if [[ -n "$NEW_FRAMEWORK" ]]; then
|
| 711 |
+
echo " - Added framework: $NEW_FRAMEWORK"
|
| 712 |
+
fi
|
| 713 |
+
|
| 714 |
+
if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]]; then
|
| 715 |
+
echo " - Added database: $NEW_DB"
|
| 716 |
+
fi
|
| 717 |
+
|
| 718 |
+
echo
|
| 719 |
+
|
| 720 |
+
log_info "Usage: $0 [claude|gemini|copilot|cursor-agent|qwen|opencode|codex|windsurf|kilocode|auggie|codebuddy|q]"
|
| 721 |
+
}
|
| 722 |
+
|
| 723 |
+
#==============================================================================
|
| 724 |
+
# Main Execution
|
| 725 |
+
#==============================================================================
|
| 726 |
+
|
| 727 |
+
main() {
|
| 728 |
+
# Validate environment before proceeding
|
| 729 |
+
validate_environment
|
| 730 |
+
|
| 731 |
+
log_info "=== Updating agent context files for feature $CURRENT_BRANCH ==="
|
| 732 |
+
|
| 733 |
+
# Parse the plan file to extract project information
|
| 734 |
+
if ! parse_plan_data "$NEW_PLAN"; then
|
| 735 |
+
log_error "Failed to parse plan data"
|
| 736 |
+
exit 1
|
| 737 |
+
fi
|
| 738 |
+
|
| 739 |
+
# Process based on agent type argument
|
| 740 |
+
local success=true
|
| 741 |
+
|
| 742 |
+
if [[ -z "$AGENT_TYPE" ]]; then
|
| 743 |
+
# No specific agent provided - update all existing agent files
|
| 744 |
+
log_info "No agent specified, updating all existing agent files..."
|
| 745 |
+
if ! update_all_existing_agents; then
|
| 746 |
+
success=false
|
| 747 |
+
fi
|
| 748 |
+
else
|
| 749 |
+
# Specific agent provided - update only that agent
|
| 750 |
+
log_info "Updating specific agent: $AGENT_TYPE"
|
| 751 |
+
if ! update_specific_agent "$AGENT_TYPE"; then
|
| 752 |
+
success=false
|
| 753 |
+
fi
|
| 754 |
+
fi
|
| 755 |
+
|
| 756 |
+
# Print summary
|
| 757 |
+
print_summary
|
| 758 |
+
|
| 759 |
+
if [[ "$success" == true ]]; then
|
| 760 |
+
log_success "Agent context update completed successfully"
|
| 761 |
+
exit 0
|
| 762 |
+
else
|
| 763 |
+
log_error "Agent context update completed with errors"
|
| 764 |
+
exit 1
|
| 765 |
+
fi
|
| 766 |
+
}
|
| 767 |
+
|
| 768 |
+
# Execute main function if script is run directly
|
| 769 |
+
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
|
| 770 |
+
main "$@"
|
| 771 |
+
fi
|
| 772 |
+
|
.specify/templates/agent-file-template.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# [PROJECT NAME] Development Guidelines
|
| 2 |
+
|
| 3 |
+
Auto-generated from all feature plans. Last updated: [DATE]
|
| 4 |
+
|
| 5 |
+
## Active Technologies
|
| 6 |
+
|
| 7 |
+
[EXTRACTED FROM ALL PLAN.MD FILES]
|
| 8 |
+
|
| 9 |
+
## Project Structure
|
| 10 |
+
|
| 11 |
+
```text
|
| 12 |
+
[ACTUAL STRUCTURE FROM PLANS]
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
## Commands
|
| 16 |
+
|
| 17 |
+
[ONLY COMMANDS FOR ACTIVE TECHNOLOGIES]
|
| 18 |
+
|
| 19 |
+
## Code Style
|
| 20 |
+
|
| 21 |
+
[LANGUAGE-SPECIFIC, ONLY FOR LANGUAGES IN USE]
|
| 22 |
+
|
| 23 |
+
## Recent Changes
|
| 24 |
+
|
| 25 |
+
[LAST 3 FEATURES AND WHAT THEY ADDED]
|
| 26 |
+
|
| 27 |
+
<!-- MANUAL ADDITIONS START -->
|
| 28 |
+
<!-- MANUAL ADDITIONS END -->
|
.specify/templates/checklist-template.md
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# [CHECKLIST TYPE] Checklist: [FEATURE NAME]
|
| 2 |
+
|
| 3 |
+
**Purpose**: [Brief description of what this checklist covers]
|
| 4 |
+
**Created**: [DATE]
|
| 5 |
+
**Feature**: [Link to spec.md or relevant documentation]
|
| 6 |
+
|
| 7 |
+
**Note**: This checklist is generated by the `/speckit.checklist` command based on feature context and requirements.
|
| 8 |
+
|
| 9 |
+
<!--
|
| 10 |
+
============================================================================
|
| 11 |
+
IMPORTANT: The checklist items below are SAMPLE ITEMS for illustration only.
|
| 12 |
+
|
| 13 |
+
The /speckit.checklist command MUST replace these with actual items based on:
|
| 14 |
+
- User's specific checklist request
|
| 15 |
+
- Feature requirements from spec.md
|
| 16 |
+
- Technical context from plan.md
|
| 17 |
+
- Implementation details from tasks.md
|
| 18 |
+
|
| 19 |
+
DO NOT keep these sample items in the generated checklist file.
|
| 20 |
+
============================================================================
|
| 21 |
+
-->
|
| 22 |
+
|
| 23 |
+
## [Category 1]
|
| 24 |
+
|
| 25 |
+
- [ ] CHK001 First checklist item with clear action
|
| 26 |
+
- [ ] CHK002 Second checklist item
|
| 27 |
+
- [ ] CHK003 Third checklist item
|
| 28 |
+
|
| 29 |
+
## [Category 2]
|
| 30 |
+
|
| 31 |
+
- [ ] CHK004 Another category item
|
| 32 |
+
- [ ] CHK005 Item with specific criteria
|
| 33 |
+
- [ ] CHK006 Final item in this category
|
| 34 |
+
|
| 35 |
+
## Notes
|
| 36 |
+
|
| 37 |
+
- Check items off as completed: `[x]`
|
| 38 |
+
- Add comments or findings inline
|
| 39 |
+
- Link to relevant resources or documentation
|
| 40 |
+
- Items are numbered sequentially for easy reference
|
.specify/templates/plan-template.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan: [FEATURE]
|
| 2 |
+
|
| 3 |
+
**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
|
| 4 |
+
**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
|
| 5 |
+
|
| 6 |
+
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/templates/commands/plan.md` for the execution workflow.
|
| 7 |
+
|
| 8 |
+
## Summary
|
| 9 |
+
|
| 10 |
+
[Extract from feature spec: primary requirement + technical approach from research]
|
| 11 |
+
|
| 12 |
+
## Technical Context
|
| 13 |
+
|
| 14 |
+
<!--
|
| 15 |
+
ACTION REQUIRED: Replace the content in this section with the technical details
|
| 16 |
+
for the project. The structure here is presented in advisory capacity to guide
|
| 17 |
+
the iteration process.
|
| 18 |
+
-->
|
| 19 |
+
|
| 20 |
+
**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
|
| 21 |
+
**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
|
| 22 |
+
**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
|
| 23 |
+
**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
|
| 24 |
+
**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
|
| 25 |
+
**Project Type**: [single/web/mobile - determines source structure]
|
| 26 |
+
**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
|
| 27 |
+
**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
|
| 28 |
+
**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
|
| 29 |
+
|
| 30 |
+
## Constitution Check
|
| 31 |
+
|
| 32 |
+
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
| 33 |
+
|
| 34 |
+
[Gates determined based on constitution file]
|
| 35 |
+
|
| 36 |
+
## Project Structure
|
| 37 |
+
|
| 38 |
+
### Documentation (this feature)
|
| 39 |
+
|
| 40 |
+
```text
|
| 41 |
+
specs/[###-feature]/
|
| 42 |
+
├── plan.md # This file (/speckit.plan command output)
|
| 43 |
+
├── research.md # Phase 0 output (/speckit.plan command)
|
| 44 |
+
├── data-model.md # Phase 1 output (/speckit.plan command)
|
| 45 |
+
├── quickstart.md # Phase 1 output (/speckit.plan command)
|
| 46 |
+
├── contracts/ # Phase 1 output (/speckit.plan command)
|
| 47 |
+
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Source Code (repository root)
|
| 51 |
+
<!--
|
| 52 |
+
ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
|
| 53 |
+
for this feature. Delete unused options and expand the chosen structure with
|
| 54 |
+
real paths (e.g., apps/admin, packages/something). The delivered plan must
|
| 55 |
+
not include Option labels.
|
| 56 |
+
-->
|
| 57 |
+
|
| 58 |
+
```text
|
| 59 |
+
# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
|
| 60 |
+
src/
|
| 61 |
+
├── models/
|
| 62 |
+
├── services/
|
| 63 |
+
├── cli/
|
| 64 |
+
└── lib/
|
| 65 |
+
|
| 66 |
+
tests/
|
| 67 |
+
├── contract/
|
| 68 |
+
├── integration/
|
| 69 |
+
└── unit/
|
| 70 |
+
|
| 71 |
+
# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
|
| 72 |
+
backend/
|
| 73 |
+
├── src/
|
| 74 |
+
│ ├── models/
|
| 75 |
+
│ ├── services/
|
| 76 |
+
│ └── api/
|
| 77 |
+
└── tests/
|
| 78 |
+
|
| 79 |
+
frontend/
|
| 80 |
+
├── src/
|
| 81 |
+
│ ├── components/
|
| 82 |
+
│ ├── pages/
|
| 83 |
+
│ └── services/
|
| 84 |
+
└── tests/
|
| 85 |
+
|
| 86 |
+
# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
|
| 87 |
+
api/
|
| 88 |
+
└── [same as backend above]
|
| 89 |
+
|
| 90 |
+
ios/ or android/
|
| 91 |
+
└── [platform-specific structure: feature modules, UI flows, platform tests]
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
**Structure Decision**: [Document the selected structure and reference the real
|
| 95 |
+
directories captured above]
|
| 96 |
+
|
| 97 |
+
## Complexity Tracking
|
| 98 |
+
|
| 99 |
+
> **Fill ONLY if Constitution Check has violations that must be justified**
|
| 100 |
+
|
| 101 |
+
| Violation | Why Needed | Simpler Alternative Rejected Because |
|
| 102 |
+
|-----------|------------|-------------------------------------|
|
| 103 |
+
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
|
| 104 |
+
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
|
.specify/templates/spec-template.md
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Feature Specification: [FEATURE NAME]
|
| 2 |
+
|
| 3 |
+
**Feature Branch**: `[###-feature-name]`
|
| 4 |
+
**Created**: [DATE]
|
| 5 |
+
**Status**: Draft
|
| 6 |
+
**Input**: User description: "$ARGUMENTS"
|
| 7 |
+
|
| 8 |
+
## User Scenarios & Testing *(mandatory)*
|
| 9 |
+
|
| 10 |
+
<!--
|
| 11 |
+
IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
|
| 12 |
+
Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
|
| 13 |
+
you should still have a viable MVP (Minimum Viable Product) that delivers value.
|
| 14 |
+
|
| 15 |
+
Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
|
| 16 |
+
Think of each story as a standalone slice of functionality that can be:
|
| 17 |
+
- Developed independently
|
| 18 |
+
- Tested independently
|
| 19 |
+
- Deployed independently
|
| 20 |
+
- Demonstrated to users independently
|
| 21 |
+
-->
|
| 22 |
+
|
| 23 |
+
### User Story 1 - [Brief Title] (Priority: P1)
|
| 24 |
+
|
| 25 |
+
[Describe this user journey in plain language]
|
| 26 |
+
|
| 27 |
+
**Why this priority**: [Explain the value and why it has this priority level]
|
| 28 |
+
|
| 29 |
+
**Independent Test**: [Describe how this can be tested independently - e.g., "Can be fully tested by [specific action] and delivers [specific value]"]
|
| 30 |
+
|
| 31 |
+
**Acceptance Scenarios**:
|
| 32 |
+
|
| 33 |
+
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
|
| 34 |
+
2. **Given** [initial state], **When** [action], **Then** [expected outcome]
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
### User Story 2 - [Brief Title] (Priority: P2)
|
| 39 |
+
|
| 40 |
+
[Describe this user journey in plain language]
|
| 41 |
+
|
| 42 |
+
**Why this priority**: [Explain the value and why it has this priority level]
|
| 43 |
+
|
| 44 |
+
**Independent Test**: [Describe how this can be tested independently]
|
| 45 |
+
|
| 46 |
+
**Acceptance Scenarios**:
|
| 47 |
+
|
| 48 |
+
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
### User Story 3 - [Brief Title] (Priority: P3)
|
| 53 |
+
|
| 54 |
+
[Describe this user journey in plain language]
|
| 55 |
+
|
| 56 |
+
**Why this priority**: [Explain the value and why it has this priority level]
|
| 57 |
+
|
| 58 |
+
**Independent Test**: [Describe how this can be tested independently]
|
| 59 |
+
|
| 60 |
+
**Acceptance Scenarios**:
|
| 61 |
+
|
| 62 |
+
1. **Given** [initial state], **When** [action], **Then** [expected outcome]
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
[Add more user stories as needed, each with an assigned priority]
|
| 67 |
+
|
| 68 |
+
### Edge Cases
|
| 69 |
+
|
| 70 |
+
<!--
|
| 71 |
+
ACTION REQUIRED: The content in this section represents placeholders.
|
| 72 |
+
Fill them out with the right edge cases.
|
| 73 |
+
-->
|
| 74 |
+
|
| 75 |
+
- What happens when [boundary condition]?
|
| 76 |
+
- How does system handle [error scenario]?
|
| 77 |
+
|
| 78 |
+
## Requirements *(mandatory)*
|
| 79 |
+
|
| 80 |
+
<!--
|
| 81 |
+
ACTION REQUIRED: The content in this section represents placeholders.
|
| 82 |
+
Fill them out with the right functional requirements.
|
| 83 |
+
-->
|
| 84 |
+
|
| 85 |
+
### Functional Requirements
|
| 86 |
+
|
| 87 |
+
- **FR-001**: System MUST [specific capability, e.g., "allow users to create accounts"]
|
| 88 |
+
- **FR-002**: System MUST [specific capability, e.g., "validate email addresses"]
|
| 89 |
+
- **FR-003**: Users MUST be able to [key interaction, e.g., "reset their password"]
|
| 90 |
+
- **FR-004**: System MUST [data requirement, e.g., "persist user preferences"]
|
| 91 |
+
- **FR-005**: System MUST [behavior, e.g., "log all security events"]
|
| 92 |
+
|
| 93 |
+
*Example of marking unclear requirements:*
|
| 94 |
+
|
| 95 |
+
- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]
|
| 96 |
+
- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]
|
| 97 |
+
|
| 98 |
+
### Key Entities *(include if feature involves data)*
|
| 99 |
+
|
| 100 |
+
- **[Entity 1]**: [What it represents, key attributes without implementation]
|
| 101 |
+
- **[Entity 2]**: [What it represents, relationships to other entities]
|
| 102 |
+
|
| 103 |
+
## Success Criteria *(mandatory)*
|
| 104 |
+
|
| 105 |
+
<!--
|
| 106 |
+
ACTION REQUIRED: Define measurable success criteria.
|
| 107 |
+
These must be technology-agnostic and measurable.
|
| 108 |
+
-->
|
| 109 |
+
|
| 110 |
+
### Measurable Outcomes
|
| 111 |
+
|
| 112 |
+
- **SC-001**: [Measurable metric, e.g., "Users can complete account creation in under 2 minutes"]
|
| 113 |
+
- **SC-002**: [Measurable metric, e.g., "System handles 1000 concurrent users without degradation"]
|
| 114 |
+
- **SC-003**: [User satisfaction metric, e.g., "90% of users successfully complete primary task on first attempt"]
|
| 115 |
+
- **SC-004**: [Business metric, e.g., "Reduce support tickets related to [X] by 50%"]
|
.specify/templates/tasks-template.md
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
|
| 3 |
+
description: "Task list template for feature implementation"
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Tasks: [FEATURE NAME]
|
| 7 |
+
|
| 8 |
+
**Input**: Design documents from `/specs/[###-feature-name]/`
|
| 9 |
+
**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
|
| 10 |
+
|
| 11 |
+
**Tests**: The examples below include test tasks. Tests are OPTIONAL - only include them if explicitly requested in the feature specification.
|
| 12 |
+
|
| 13 |
+
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
|
| 14 |
+
|
| 15 |
+
## Format: `[ID] [P?] [Story] Description`
|
| 16 |
+
|
| 17 |
+
- **[P]**: Can run in parallel (different files, no dependencies)
|
| 18 |
+
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
|
| 19 |
+
- Include exact file paths in descriptions
|
| 20 |
+
|
| 21 |
+
## Path Conventions
|
| 22 |
+
|
| 23 |
+
- **Single project**: `src/`, `tests/` at repository root
|
| 24 |
+
- **Web app**: `backend/src/`, `frontend/src/`
|
| 25 |
+
- **Mobile**: `api/src/`, `ios/src/` or `android/src/`
|
| 26 |
+
- Paths shown below assume single project - adjust based on plan.md structure
|
| 27 |
+
|
| 28 |
+
<!--
|
| 29 |
+
============================================================================
|
| 30 |
+
IMPORTANT: The tasks below are SAMPLE TASKS for illustration purposes only.
|
| 31 |
+
|
| 32 |
+
The /speckit.tasks command MUST replace these with actual tasks based on:
|
| 33 |
+
- User stories from spec.md (with their priorities P1, P2, P3...)
|
| 34 |
+
- Feature requirements from plan.md
|
| 35 |
+
- Entities from data-model.md
|
| 36 |
+
- Endpoints from contracts/
|
| 37 |
+
|
| 38 |
+
Tasks MUST be organized by user story so each story can be:
|
| 39 |
+
- Implemented independently
|
| 40 |
+
- Tested independently
|
| 41 |
+
- Delivered as an MVP increment
|
| 42 |
+
|
| 43 |
+
DO NOT keep these sample tasks in the generated tasks.md file.
|
| 44 |
+
============================================================================
|
| 45 |
+
-->
|
| 46 |
+
|
| 47 |
+
## Phase 1: Setup (Shared Infrastructure)
|
| 48 |
+
|
| 49 |
+
**Purpose**: Project initialization and basic structure
|
| 50 |
+
|
| 51 |
+
- [ ] T001 Create project structure per implementation plan
|
| 52 |
+
- [ ] T002 Initialize [language] project with [framework] dependencies
|
| 53 |
+
- [ ] T003 [P] Configure linting and formatting tools
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Phase 2: Foundational (Blocking Prerequisites)
|
| 58 |
+
|
| 59 |
+
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
|
| 60 |
+
|
| 61 |
+
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
|
| 62 |
+
|
| 63 |
+
Examples of foundational tasks (adjust based on your project):
|
| 64 |
+
|
| 65 |
+
- [ ] T004 Setup database schema and migrations framework
|
| 66 |
+
- [ ] T005 [P] Implement authentication/authorization framework
|
| 67 |
+
- [ ] T006 [P] Setup API routing and middleware structure
|
| 68 |
+
- [ ] T007 Create base models/entities that all stories depend on
|
| 69 |
+
- [ ] T008 Configure error handling and logging infrastructure
|
| 70 |
+
- [ ] T009 Setup environment configuration management
|
| 71 |
+
|
| 72 |
+
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## Phase 3: User Story 1 - [Title] (Priority: P1) 🎯 MVP
|
| 77 |
+
|
| 78 |
+
**Goal**: [Brief description of what this story delivers]
|
| 79 |
+
|
| 80 |
+
**Independent Test**: [How to verify this story works on its own]
|
| 81 |
+
|
| 82 |
+
### Tests for User Story 1 (OPTIONAL - only if tests requested) ⚠️
|
| 83 |
+
|
| 84 |
+
> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
|
| 85 |
+
|
| 86 |
+
- [ ] T010 [P] [US1] Contract test for [endpoint] in tests/contract/test_[name].py
|
| 87 |
+
- [ ] T011 [P] [US1] Integration test for [user journey] in tests/integration/test_[name].py
|
| 88 |
+
|
| 89 |
+
### Implementation for User Story 1
|
| 90 |
+
|
| 91 |
+
- [ ] T012 [P] [US1] Create [Entity1] model in src/models/[entity1].py
|
| 92 |
+
- [ ] T013 [P] [US1] Create [Entity2] model in src/models/[entity2].py
|
| 93 |
+
- [ ] T014 [US1] Implement [Service] in src/services/[service].py (depends on T012, T013)
|
| 94 |
+
- [ ] T015 [US1] Implement [endpoint/feature] in src/[location]/[file].py
|
| 95 |
+
- [ ] T016 [US1] Add validation and error handling
|
| 96 |
+
- [ ] T017 [US1] Add logging for user story 1 operations
|
| 97 |
+
|
| 98 |
+
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## Phase 4: User Story 2 - [Title] (Priority: P2)
|
| 103 |
+
|
| 104 |
+
**Goal**: [Brief description of what this story delivers]
|
| 105 |
+
|
| 106 |
+
**Independent Test**: [How to verify this story works on its own]
|
| 107 |
+
|
| 108 |
+
### Tests for User Story 2 (OPTIONAL - only if tests requested) ⚠️
|
| 109 |
+
|
| 110 |
+
- [ ] T018 [P] [US2] Contract test for [endpoint] in tests/contract/test_[name].py
|
| 111 |
+
- [ ] T019 [P] [US2] Integration test for [user journey] in tests/integration/test_[name].py
|
| 112 |
+
|
| 113 |
+
### Implementation for User Story 2
|
| 114 |
+
|
| 115 |
+
- [ ] T020 [P] [US2] Create [Entity] model in src/models/[entity].py
|
| 116 |
+
- [ ] T021 [US2] Implement [Service] in src/services/[service].py
|
| 117 |
+
- [ ] T022 [US2] Implement [endpoint/feature] in src/[location]/[file].py
|
| 118 |
+
- [ ] T023 [US2] Integrate with User Story 1 components (if needed)
|
| 119 |
+
|
| 120 |
+
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## Phase 5: User Story 3 - [Title] (Priority: P3)
|
| 125 |
+
|
| 126 |
+
**Goal**: [Brief description of what this story delivers]
|
| 127 |
+
|
| 128 |
+
**Independent Test**: [How to verify this story works on its own]
|
| 129 |
+
|
| 130 |
+
### Tests for User Story 3 (OPTIONAL - only if tests requested) ⚠️
|
| 131 |
+
|
| 132 |
+
- [ ] T024 [P] [US3] Contract test for [endpoint] in tests/contract/test_[name].py
|
| 133 |
+
- [ ] T025 [P] [US3] Integration test for [user journey] in tests/integration/test_[name].py
|
| 134 |
+
|
| 135 |
+
### Implementation for User Story 3
|
| 136 |
+
|
| 137 |
+
- [ ] T026 [P] [US3] Create [Entity] model in src/models/[entity].py
|
| 138 |
+
- [ ] T027 [US3] Implement [Service] in src/services/[service].py
|
| 139 |
+
- [ ] T028 [US3] Implement [endpoint/feature] in src/[location]/[file].py
|
| 140 |
+
|
| 141 |
+
**Checkpoint**: All user stories should now be independently functional
|
| 142 |
+
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
[Add more user story phases as needed, following the same pattern]
|
| 146 |
+
|
| 147 |
+
---
|
| 148 |
+
|
| 149 |
+
## Phase N: Polish & Cross-Cutting Concerns
|
| 150 |
+
|
| 151 |
+
**Purpose**: Improvements that affect multiple user stories
|
| 152 |
+
|
| 153 |
+
- [ ] TXXX [P] Documentation updates in docs/
|
| 154 |
+
- [ ] TXXX Code cleanup and refactoring
|
| 155 |
+
- [ ] TXXX Performance optimization across all stories
|
| 156 |
+
- [ ] TXXX [P] Additional unit tests (if requested) in tests/unit/
|
| 157 |
+
- [ ] TXXX Security hardening
|
| 158 |
+
- [ ] TXXX Run quickstart.md validation
|
| 159 |
+
|
| 160 |
+
---
|
| 161 |
+
|
| 162 |
+
## Dependencies & Execution Order
|
| 163 |
+
|
| 164 |
+
### Phase Dependencies
|
| 165 |
+
|
| 166 |
+
- **Setup (Phase 1)**: No dependencies - can start immediately
|
| 167 |
+
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
|
| 168 |
+
- **User Stories (Phase 3+)**: All depend on Foundational phase completion
|
| 169 |
+
- User stories can then proceed in parallel (if staffed)
|
| 170 |
+
- Or sequentially in priority order (P1 → P2 → P3)
|
| 171 |
+
- **Polish (Final Phase)**: Depends on all desired user stories being complete
|
| 172 |
+
|
| 173 |
+
### User Story Dependencies
|
| 174 |
+
|
| 175 |
+
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
|
| 176 |
+
- **User Story 2 (P2)**: Can start after Foundational (Phase 2) - May integrate with US1 but should be independently testable
|
| 177 |
+
- **User Story 3 (P3)**: Can start after Foundational (Phase 2) - May integrate with US1/US2 but should be independently testable
|
| 178 |
+
|
| 179 |
+
### Within Each User Story
|
| 180 |
+
|
| 181 |
+
- Tests (if included) MUST be written and FAIL before implementation
|
| 182 |
+
- Models before services
|
| 183 |
+
- Services before endpoints
|
| 184 |
+
- Core implementation before integration
|
| 185 |
+
- Story complete before moving to next priority
|
| 186 |
+
|
| 187 |
+
### Parallel Opportunities
|
| 188 |
+
|
| 189 |
+
- All Setup tasks marked [P] can run in parallel
|
| 190 |
+
- All Foundational tasks marked [P] can run in parallel (within Phase 2)
|
| 191 |
+
- Once Foundational phase completes, all user stories can start in parallel (if team capacity allows)
|
| 192 |
+
- All tests for a user story marked [P] can run in parallel
|
| 193 |
+
- Models within a story marked [P] can run in parallel
|
| 194 |
+
- Different user stories can be worked on in parallel by different team members
|
| 195 |
+
|
| 196 |
+
---
|
| 197 |
+
|
| 198 |
+
## Parallel Example: User Story 1
|
| 199 |
+
|
| 200 |
+
```bash
|
| 201 |
+
# Launch all tests for User Story 1 together (if tests requested):
|
| 202 |
+
Task: "Contract test for [endpoint] in tests/contract/test_[name].py"
|
| 203 |
+
Task: "Integration test for [user journey] in tests/integration/test_[name].py"
|
| 204 |
+
|
| 205 |
+
# Launch all models for User Story 1 together:
|
| 206 |
+
Task: "Create [Entity1] model in src/models/[entity1].py"
|
| 207 |
+
Task: "Create [Entity2] model in src/models/[entity2].py"
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
---
|
| 211 |
+
|
| 212 |
+
## Implementation Strategy
|
| 213 |
+
|
| 214 |
+
### MVP First (User Story 1 Only)
|
| 215 |
+
|
| 216 |
+
1. Complete Phase 1: Setup
|
| 217 |
+
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
|
| 218 |
+
3. Complete Phase 3: User Story 1
|
| 219 |
+
4. **STOP and VALIDATE**: Test User Story 1 independently
|
| 220 |
+
5. Deploy/demo if ready
|
| 221 |
+
|
| 222 |
+
### Incremental Delivery
|
| 223 |
+
|
| 224 |
+
1. Complete Setup + Foundational → Foundation ready
|
| 225 |
+
2. Add User Story 1 → Test independently → Deploy/Demo (MVP!)
|
| 226 |
+
3. Add User Story 2 → Test independently → Deploy/Demo
|
| 227 |
+
4. Add User Story 3 → Test independently → Deploy/Demo
|
| 228 |
+
5. Each story adds value without breaking previous stories
|
| 229 |
+
|
| 230 |
+
### Parallel Team Strategy
|
| 231 |
+
|
| 232 |
+
With multiple developers:
|
| 233 |
+
|
| 234 |
+
1. Team completes Setup + Foundational together
|
| 235 |
+
2. Once Foundational is done:
|
| 236 |
+
- Developer A: User Story 1
|
| 237 |
+
- Developer B: User Story 2
|
| 238 |
+
- Developer C: User Story 3
|
| 239 |
+
3. Stories complete and integrate independently
|
| 240 |
+
|
| 241 |
+
---
|
| 242 |
+
|
| 243 |
+
## Notes
|
| 244 |
+
|
| 245 |
+
- [P] tasks = different files, no dependencies
|
| 246 |
+
- [Story] label maps task to specific user story for traceability
|
| 247 |
+
- Each user story should be independently completable and testable
|
| 248 |
+
- Verify tests fail before implementing
|
| 249 |
+
- Commit after each task or logical group
|
| 250 |
+
- Stop at any checkpoint to validate story independently
|
| 251 |
+
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
|
CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Contributing to AI-Me
|
| 2 |
+
|
| 3 |
+
Welcome! This document outlines the process for contributing to the AI-Me project.
|
| 4 |
+
|
| 5 |
+
## Prerequisites
|
| 6 |
+
|
| 7 |
+
- Python 3.12+ (managed by `uv`)
|
| 8 |
+
- Git with GPG signing configured
|
| 9 |
+
- Basic understanding of async Python and RAG concepts (see `.specify/memory/constitution.md`)
|
| 10 |
+
|
| 11 |
+
## Setup
|
| 12 |
+
|
| 13 |
+
### 1. Clone and Install Dependencies
|
| 14 |
+
|
| 15 |
+
```bash
|
| 16 |
+
git clone https://github.com/byoung/ai-me.git
|
| 17 |
+
cd ai-me
|
| 18 |
+
uv sync
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
### 2. Environment Configuration
|
| 22 |
+
|
| 23 |
+
Create a `.env` file in the project root with required keys:
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
# LLM Configuration
|
| 27 |
+
OPENAI_API_KEY=sk-...
|
| 28 |
+
GROQ_API_KEY=gsk-...
|
| 29 |
+
|
| 30 |
+
# Bot Identity
|
| 31 |
+
BOT_FULL_NAME="Ben Young"
|
| 32 |
+
APP_NAME="AI-Me"
|
| 33 |
+
|
| 34 |
+
# Optional: External Tools
|
| 35 |
+
GITHUB_PERSONAL_ACCESS_TOKEN=ghp_...
|
| 36 |
+
LINKEDIN_API_TOKEN=...
|
| 37 |
+
|
| 38 |
+
# Optional: Remote Logging (Grafana Loki)
|
| 39 |
+
LOKI_URL=https://logs-prod-us-central1.grafana.net
|
| 40 |
+
LOKI_USERNAME=...
|
| 41 |
+
LOKI_PASSWORD=...
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### 3. Configure Git Commit Signing
|
| 45 |
+
|
| 46 |
+
See this guide on setting up gpg keys:
|
| 47 |
+
|
| 48 |
+
https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
**All commits MUST be GPG-signed.**
|
TESTING.md
CHANGED
|
@@ -55,6 +55,43 @@ uv run pytest src/test.py::test_rear_knowledge_contains_it245 -v
|
|
| 55 |
|
| 56 |
The temperature of 0 ensures that the agent's responses are consistent across test runs, making assertions more reliable.
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
## Future Enhancements
|
| 59 |
- [ ] Add tests for error handling and edge cases
|
| 60 |
- [ ] Add performance benchmarks
|
|
|
|
| 55 |
|
| 56 |
The temperature of 0 ensures that the agent's responses are consistent across test runs, making assertions more reliable.
|
| 57 |
|
| 58 |
+
## Session Isolation Testing (Manual)
|
| 59 |
+
|
| 60 |
+
**Note on Concurrency Testing**: Rather than implement brittle pytest-based concurrency tests, session isolation (SC-006) is verified through **manual browser-based testing**:
|
| 61 |
+
|
| 62 |
+
### Steps to Manually Test Session Isolation
|
| 63 |
+
|
| 64 |
+
1. **Start the app**:
|
| 65 |
+
```bash
|
| 66 |
+
uv run src/app.py
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
2. **Open multiple browser tabs** (or separate browsers):
|
| 70 |
+
- Tab A: http://localhost:7860
|
| 71 |
+
- Tab B: http://localhost:7860
|
| 72 |
+
- Tab C: http://localhost:7860
|
| 73 |
+
|
| 74 |
+
3. **Test scenario**: Interleave conversations across tabs
|
| 75 |
+
- Tab A: "Hi, My name is Slartibartfast."
|
| 76 |
+
- Tab B: "Hi, how are you?"
|
| 77 |
+
- Tab A: "what is my name?"
|
| 78 |
+
- Tab B: "what is my name?"
|
| 79 |
+
|
| 80 |
+
4. **Verify**:
|
| 81 |
+
- ✅ Each tab maintains independent conversation history
|
| 82 |
+
- ✅ No information leaks between tabs -- tab B should say I don't know your name.
|
| 83 |
+
- ✅ Memory tool doesn't share state (different users in Memory graphs)
|
| 84 |
+
- ✅ Each session gets unique `session_id` in logs (check `uv run src/app.py` output)
|
| 85 |
+
|
| 86 |
+
### Why Manual Testing?
|
| 87 |
+
|
| 88 |
+
Integration tests for concurrent browser sessions are:
|
| 89 |
+
- **Brittle**: Timing-dependent, fail randomly due to race conditions
|
| 90 |
+
- **Slow**: Multiple concurrent LLM calls slow down test execution
|
| 91 |
+
- **Fragile**: Heavy on resources, fail in CI/CD environments
|
| 92 |
+
- **Hard to debug**: Concurrent failures are difficult to reproduce and fix
|
| 93 |
+
|
| 94 |
+
|
| 95 |
## Future Enhancements
|
| 96 |
- [ ] Add tests for error handling and edge cases
|
| 97 |
- [ ] Add performance benchmarks
|
docs/CONFLICT_DETECTION.md
ADDED
|
@@ -0,0 +1,307 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Conflict Detection & Resolution Framework
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This document describes the design and implementation of conflict detection in the AI-Me agent, specifically for FR-011 (Multi-Source Conflict Resolution). When the agent retrieves information from multiple sources that contradict each other, it must:
|
| 6 |
+
|
| 7 |
+
1. **Detect** semantic conflicts (contradictions, ambiguities)
|
| 8 |
+
2. **Log** the conflict with full context for auditing
|
| 9 |
+
3. **Acknowledge** the conflict to the user
|
| 10 |
+
4. **Resolve** by presenting both perspectives or indicating uncertainty
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## What Constitutes a Conflict?
|
| 15 |
+
|
| 16 |
+
A **conflict** occurs when:
|
| 17 |
+
|
| 18 |
+
### Type 1: Direct Contradiction
|
| 19 |
+
Two retrieved documents state opposite facts about the same subject.
|
| 20 |
+
|
| 21 |
+
**Example**:
|
| 22 |
+
- Document A: "I worked at Company X for 5 years"
|
| 23 |
+
- Document B: "I worked at Company X for 3 years"
|
| 24 |
+
|
| 25 |
+
### Type 2: Ambiguous/Unclear Information
|
| 26 |
+
Retrieved information is vague or contradicts the user's expectations.
|
| 27 |
+
|
| 28 |
+
**Example**:
|
| 29 |
+
- User asks: "How many years of Python experience do you have?"
|
| 30 |
+
- Retrieved docs mention: "Python" and "10 years" but the connection isn't explicit
|
| 31 |
+
|
| 32 |
+
### Type 3: Incomplete Coverage
|
| 33 |
+
Multiple documents cover the same topic incompletely, requiring disambiguation.
|
| 34 |
+
|
| 35 |
+
**Example**:
|
| 36 |
+
- Document A: "Worked on backend APIs"
|
| 37 |
+
- Document B: "Worked on deployment infrastructure"
|
| 38 |
+
- User asks: "What tech stack have you worked with?" (requires synthesizing both)
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Detection Strategy
|
| 43 |
+
|
| 44 |
+
### Level 1: Semantic Detection (LLM-Based)
|
| 45 |
+
|
| 46 |
+
The agent uses its instruction prompt to identify conflicts. The prompt includes guidelines on conflict handling.
|
| 47 |
+
|
| 48 |
+
### Level 2: Pattern Matching (Regex-Based)
|
| 49 |
+
|
| 50 |
+
Loki queries identify responses containing conflict indicators:
|
| 51 |
+
|
| 52 |
+
```loki
|
| 53 |
+
{job="ai-me"} | json
|
| 54 |
+
| line_format "{{.message}}"
|
| 55 |
+
| regex "(?i)(conflicting|contradict|not sure|unclear|conflicting information|different sources suggest|one source says|another source says|both|however|but|though)"
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
### Level 3: Structured Logging
|
| 59 |
+
|
| 60 |
+
All conflict incidents are logged with:
|
| 61 |
+
- Session ID
|
| 62 |
+
- User query
|
| 63 |
+
- Retrieved documents with sources
|
| 64 |
+
- Identified conflict type
|
| 65 |
+
- Agent's resolution approach
|
| 66 |
+
- User satisfaction (if collected via follow-up)
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Loki Queries for Conflict Monitoring
|
| 71 |
+
|
| 72 |
+
### Query 1: Find All Conflict Indicators
|
| 73 |
+
|
| 74 |
+
```loki
|
| 75 |
+
{job="ai-me"} | json
|
| 76 |
+
| line_format "{{.message}}"
|
| 77 |
+
| regex "(?i)(conflicting|contradict|not sure|unclear)"
|
| 78 |
+
| stats count() by session_id
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
**Measurement**:
|
| 82 |
+
- Expected: Conflicts relatively rare (unless knowledge base is contradictory)
|
| 83 |
+
- Investigation trigger: >10% of conversations contain conflict indicators
|
| 84 |
+
|
| 85 |
+
### Query 2: Find Unresolved Conflicts
|
| 86 |
+
|
| 87 |
+
```loki
|
| 88 |
+
{job="ai-me"} | json
|
| 89 |
+
| line_format "{{.message}}"
|
| 90 |
+
| regex "(?i)(conflicting.*unknown|contradict.*not sure|unclear.*can.?t.*determine)"
|
| 91 |
+
| stats count() as unresolved_conflicts
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
**Measurement**:
|
| 95 |
+
- Goal: Minimize unresolved conflicts (acknowledge but present both sides)
|
| 96 |
+
- Alert if: unresolved_conflicts > 0 (indicates poor conflict handling)
|
| 97 |
+
|
| 98 |
+
### Query 3: Conflict Resolution Pattern
|
| 99 |
+
|
| 100 |
+
```loki
|
| 101 |
+
{job="ai-me"} | json
|
| 102 |
+
| line_format "{{.message}}"
|
| 103 |
+
| regex "(?i)((according to|per my|source.*says).*but.*(according to|per my|source.*says))"
|
| 104 |
+
| stats count() as pairwise_conflicts
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
**Measurement**:
|
| 108 |
+
- Good pattern: Shows agent is presenting both sides with attribution
|
| 109 |
+
- Target: ≥80% of detected conflicts include pairwise attribution
|
| 110 |
+
|
| 111 |
+
### Query 4: Source Attribution in Conflict Context
|
| 112 |
+
|
| 113 |
+
```loki
|
| 114 |
+
{job="ai-me"} | json
|
| 115 |
+
| line_format "{{.message}}"
|
| 116 |
+
| regex "(?i)conflicting"
|
| 117 |
+
| regex "https://github.com/"
|
| 118 |
+
| stats count() as attributed_conflicts
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
**Measurement**:
|
| 122 |
+
- Goal: 100% of conflicts include source citations
|
| 123 |
+
- Alert if: attributed_conflicts < (total conflicts * 0.95)
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## Conflict Resolution Flowchart
|
| 128 |
+
|
| 129 |
+
```
|
| 130 |
+
User asks question
|
| 131 |
+
↓
|
| 132 |
+
Agent calls get_local_info_tool()
|
| 133 |
+
↓
|
| 134 |
+
Multiple documents retrieved
|
| 135 |
+
↓
|
| 136 |
+
Are they conflicting?
|
| 137 |
+
↓
|
| 138 |
+
YES ──→ DETECT CONFLICT
|
| 139 |
+
↓ - What's the contradiction?
|
| 140 |
+
↓ - Why might it exist?
|
| 141 |
+
↓
|
| 142 |
+
├─→ Direct contradiction (different facts)
|
| 143 |
+
│ └─→ Cite both sources, indicate uncertainty, choose most likely
|
| 144 |
+
│
|
| 145 |
+
├─→ Ambiguous (unclear connection)
|
| 146 |
+
│ └─→ Acknowledge ambiguity, provide context
|
| 147 |
+
│
|
| 148 |
+
└─→ Incomplete (multiple partial truths)
|
| 149 |
+
└─→ Synthesize both perspectives
|
| 150 |
+
↓
|
| 151 |
+
ACKNOWLEDGE to user
|
| 152 |
+
├─→ "I found conflicting information..."
|
| 153 |
+
├─→ Cite both sources with URLs
|
| 154 |
+
└─→ Explain discrepancy or ask for clarification
|
| 155 |
+
↓
|
| 156 |
+
RESPOND to user
|
| 157 |
+
└─→ Include full source attribution
|
| 158 |
+
|
| 159 |
+
NO ──→ RESPOND normally
|
| 160 |
+
├─→ Include source citations
|
| 161 |
+
└─→ Single clear answer
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
---
|
| 165 |
+
|
| 166 |
+
## Finding Conflicts: Loki Queries
|
| 167 |
+
|
| 168 |
+
### Query to Find Reported Conflicts
|
| 169 |
+
|
| 170 |
+
```loki
|
| 171 |
+
{job="ai-me"}
|
| 172 |
+
| json
|
| 173 |
+
| message =~ "(?i)(conflicting|contradict|not sure|unclear|uncertain|conflicting information)"
|
| 174 |
+
| session_id != ""
|
| 175 |
+
| line_format "{{.timestamp}} [{{.session_id}}] {{.message}}"
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
**How to use** (in Grafana Loki):
|
| 179 |
+
1. Open Grafana instance configured with Loki
|
| 180 |
+
2. Go to Explore → Loki
|
| 181 |
+
3. Copy the query above into the query field
|
| 182 |
+
4. Adjust regex pattern to match your conflict keywords
|
| 183 |
+
5. Filter results by date range or session ID
|
| 184 |
+
6. Results show all sessions where agent reported uncertainty
|
| 185 |
+
|
| 186 |
+
### Alternative: Find Specific Topics
|
| 187 |
+
|
| 188 |
+
```loki
|
| 189 |
+
{job="ai-me"}
|
| 190 |
+
| json
|
| 191 |
+
| message =~ "(?i)(Python.*(?:conflicting|contradict|not sure|unclear))"
|
| 192 |
+
| session_id != ""
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
### Alternative: Analyze by Session
|
| 196 |
+
|
| 197 |
+
```loki
|
| 198 |
+
{job="ai-me"}
|
| 199 |
+
| json
|
| 200 |
+
| message =~ "(?i)(conflicting|contradict|not sure|unclear)"
|
| 201 |
+
| stats count() by session_id
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## Integration Points
|
| 207 |
+
|
| 208 |
+
### Agent System Prompt
|
| 209 |
+
|
| 210 |
+
The agent prompt (in `src/agent.py`) includes guidance on uncertainty and conflict acknowledgment.
|
| 211 |
+
|
| 212 |
+
### Logging Configuration
|
| 213 |
+
|
| 214 |
+
Loki integration configured in `src/config.py`:
|
| 215 |
+
|
| 216 |
+
```python
|
| 217 |
+
# Environment variables (optional Loki setup):
|
| 218 |
+
LOKI_URL=https://logs-prod-us-central1.grafana.net
|
| 219 |
+
LOKI_USERNAME=<grafana-cloud-username>
|
| 220 |
+
LOKI_PASSWORD=<grafana-cloud-api-key>
|
| 221 |
+
```
|
| 222 |
+
|
| 223 |
+
All logs automatically tagged with:
|
| 224 |
+
- `application: "ai-me"`
|
| 225 |
+
- `session_id: "<user-session-id>"`
|
| 226 |
+
- `level: "INFO|WARNING|ERROR"`
|
| 227 |
+
|
| 228 |
+
### Session Metadata
|
| 229 |
+
|
| 230 |
+
Each session log includes:
|
| 231 |
+
- `session_id`: Unique user session identifier
|
| 232 |
+
- `timestamp`: ISO 8601 datetime
|
| 233 |
+
- `message`: Full agent response or interaction
|
| 234 |
+
- `hostname`: Where the agent ran
|
| 235 |
+
- `process_id`: Python process ID
|
| 236 |
+
|
| 237 |
+
---
|
| 238 |
+
|
| 239 |
+
## Human Review Workflow
|
| 240 |
+
|
| 241 |
+
### Steps to Review Conflicts
|
| 242 |
+
|
| 243 |
+
1. **Run Loki query** (see above) during specified date range
|
| 244 |
+
2. **Identify sessions** with potential conflicts
|
| 245 |
+
3. **Examine responses** in full context:
|
| 246 |
+
- What was the user question?
|
| 247 |
+
- What conflicting sources were presented?
|
| 248 |
+
- How did agent acknowledge uncertainty?
|
| 249 |
+
4. **Validate or correct** documentation:
|
| 250 |
+
- Is one source outdated? Update it.
|
| 251 |
+
- Is the conflict real but unresolved? Note in docs.
|
| 252 |
+
- Is the agent confused? Improve documentation clarity.
|
| 253 |
+
|
| 254 |
+
---
|
| 255 |
+
|
| 256 |
+
## Metrics & Monitoring
|
| 257 |
+
|
| 258 |
+
### Metrics to Track
|
| 259 |
+
|
| 260 |
+
```python
|
| 261 |
+
# Useful queries for monitoring
|
| 262 |
+
queries = {
|
| 263 |
+
"conflicts_per_week": (
|
| 264 |
+
'{job="ai-me"} | json | message =~ "(?i)(conflicting|not sure)" '
|
| 265 |
+
'| stats count() by week'
|
| 266 |
+
),
|
| 267 |
+
"high_conflict_topics": (
|
| 268 |
+
'{job="ai-me"} | json | message =~ "(?i)(conflicting|not sure)" '
|
| 269 |
+
'| stats count() by message | sort by count desc | limit 10'
|
| 270 |
+
),
|
| 271 |
+
"conflict_free_sessions": (
|
| 272 |
+
'{job="ai-me"} | json | session_id != "" '
|
| 273 |
+
'| count()'
|
| 274 |
+
),
|
| 275 |
+
}
|
| 276 |
+
```
|
| 277 |
+
|
| 278 |
+
### What These Mean
|
| 279 |
+
|
| 280 |
+
- **conflicts_per_week**: Trend - are conflicts increasing? Decreasing?
|
| 281 |
+
- **high_conflict_topics**: Which topics need documentation cleanup?
|
| 282 |
+
- **conflict_free_sessions**: What % of sessions have no conflicts?
|
| 283 |
+
|
| 284 |
+
---
|
| 285 |
+
|
| 286 |
+
## Future Enhancements
|
| 287 |
+
|
| 288 |
+
1. **Automated conflict detection**: Add explicit contradiction detection in RAG tool
|
| 289 |
+
2. **Conflict metrics**: Track "conflict rate" as success criterion
|
| 290 |
+
3. **Conflict resolution workflow**: UI for humans to resolve conflicts
|
| 291 |
+
4. **Documentation versioning**: Track which doc version caused the conflict
|
| 292 |
+
5. **Semantic deduplication**: Detect conceptual duplicates in knowledge base
|
| 293 |
+
|
| 294 |
+
---
|
| 295 |
+
|
| 296 |
+
## Related Files
|
| 297 |
+
|
| 298 |
+
- `src/agent.py`: System prompt with conflict guidance
|
| 299 |
+
- `src/config.py`: Loki logging configuration (optional setup)
|
| 300 |
+
- `.github/copilot-instructions.md`: RAG principles
|
| 301 |
+
- `.specify/memory/constitution.md`: Observability principle (VII)
|
| 302 |
+
|
| 303 |
+
---
|
| 304 |
+
|
| 305 |
+
**Status**: ✅ Implemented via pragmatic LLM-based approach
|
| 306 |
+
**Verification**: Run Loki queries to monitor conflicts
|
| 307 |
+
**Next Review**: Monthly analysis of conflict queries to identify documentation gaps
|
docs/SUCCESS_METRICS.md
ADDED
|
@@ -0,0 +1,324 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AI-Me Success Metrics & Measurement Framework
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This document defines the Loki queries and measurement methodology for the 8 success criteria (SC-001 through SC-008). Each query is designed to be run against Grafana Loki to measure real-world agent performance in production.
|
| 6 |
+
|
| 7 |
+
**Prerequisites**:
|
| 8 |
+
- Grafana Loki integration enabled in `src/config.py` (set `LOKI_ENABLED=true`)
|
| 9 |
+
- Application logging all operations with session context and structured JSON
|
| 10 |
+
- Deployment has been running for sufficient time to collect data
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## SC-001: Persona Consistency
|
| 15 |
+
|
| 16 |
+
**Definition**: Agent maintains first-person perspective consistently across all responses.
|
| 17 |
+
|
| 18 |
+
**Loki Query**:
|
| 19 |
+
```loki
|
| 20 |
+
{job="ai-me"} | json | line_format "{{.message}}"
|
| 21 |
+
| regex "(?i)(^|\s)(i\s|i'm|i've|i'll|my\s|me\s)"
|
| 22 |
+
| stats count() by session_id
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
**Measurement**:
|
| 26 |
+
- ✅ **PASS**: >95% of responses from a single session use first-person pronouns
|
| 27 |
+
- ❌ **FAIL**: <90% of responses maintain first-person perspective
|
| 28 |
+
|
| 29 |
+
**Interpretation**:
|
| 30 |
+
- Count responses per session that contain first-person language
|
| 31 |
+
- Calculate percentage of total responses per session
|
| 32 |
+
- Aggregate across sessions; target is >95% average
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## SC-002: Factual Accuracy via Source Attribution
|
| 37 |
+
|
| 38 |
+
**Definition**: 100% of substantive responses include source attribution to knowledge base documents.
|
| 39 |
+
|
| 40 |
+
**Loki Query**:
|
| 41 |
+
```loki
|
| 42 |
+
{job="ai-me"} | json
|
| 43 |
+
| line_format "{{.message}}"
|
| 44 |
+
| regex "(?i)(https://github\.com/|source:|per my|as mentioned in|according to)"
|
| 45 |
+
| stats count() as attributed_responses by session_id
|
| 46 |
+
| line_format "session_id={{.session_id}} attribution_rate={{.attributed_responses}}"
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
**Measurement**:
|
| 50 |
+
- ✅ **PASS**: ≥95% of substantive responses include source links/citations
|
| 51 |
+
- ❌ **FAIL**: <90% include attribution
|
| 52 |
+
|
| 53 |
+
**Interpretation**:
|
| 54 |
+
- Filter out "I don't have documentation" knowledge gap responses (those are meta, not factual claims)
|
| 55 |
+
- Count responses that cite sources (GitHub URLs, document names, etc.)
|
| 56 |
+
- Non-substantive responses (e.g., greetings) can be excluded from denominator
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## SC-003: In-Scope Answers
|
| 61 |
+
|
| 62 |
+
**Definition**: Agent provides substantive answers to questions within documentation scope and acknowledges knowledge gaps for out-of-scope questions.
|
| 63 |
+
|
| 64 |
+
**Loki Query** (Part A - Substantive Responses):
|
| 65 |
+
```loki
|
| 66 |
+
{job="ai-me"} | json
|
| 67 |
+
| line_format "{{.message}}"
|
| 68 |
+
| regex "^.{50,}"
|
| 69 |
+
| stats count() as substantive_count by session_id
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
**Loki Query** (Part B - Knowledge Gaps):
|
| 73 |
+
```loki
|
| 74 |
+
{job="ai-me"} | json
|
| 75 |
+
| line_format "{{.message}}"
|
| 76 |
+
| regex "(?i)(i don't have|no documentation|not familiar|i'm not sure|not in my documentation)"
|
| 77 |
+
| stats count() as gap_count by session_id
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
**Measurement**:
|
| 81 |
+
- ✅ **PASS**: ≥90% of responses are either substantive (>50 chars) OR acknowledge a knowledge gap
|
| 82 |
+
- ❌ **FAIL**: >10% of responses are evasive/empty
|
| 83 |
+
|
| 84 |
+
**Interpretation**:
|
| 85 |
+
- Substantive: responses >50 characters showing genuine knowledge
|
| 86 |
+
- Knowledge gaps: explicit acknowledgment of documentation limits
|
| 87 |
+
- Combined, these two categories should represent >90% of all responses
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## SC-004: Knowledge Gap Handling
|
| 92 |
+
|
| 93 |
+
**Definition**: When documentation doesn't cover a topic, agent explicitly acknowledges this using consistent language patterns.
|
| 94 |
+
|
| 95 |
+
**Loki Query**:
|
| 96 |
+
```loki
|
| 97 |
+
{job="ai-me"} | json
|
| 98 |
+
| line_format "{{.message}}"
|
| 99 |
+
| regex "(?i)(i don't have|i'm not sure|i don't have any documentation|not familiar with|not covered in my documentation)"
|
| 100 |
+
| stats count() as gap_responses by session_id
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
**Measurement**:
|
| 104 |
+
- ✅ **PASS**: All knowledge gap responses use consistent language from the pattern above
|
| 105 |
+
- ❌ **FAIL**: Gap responses use inconsistent or unclear language
|
| 106 |
+
|
| 107 |
+
**Interpretation**:
|
| 108 |
+
- Consistency in knowledge gap messaging improves user experience
|
| 109 |
+
- Pattern represents established "graceful failure" language
|
| 110 |
+
- Count should increase over time as more knowledge gaps are encountered
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
## SC-005: Response Latency
|
| 115 |
+
|
| 116 |
+
**Definition**: 95% of responses complete within 5 seconds (SLA target).
|
| 117 |
+
|
| 118 |
+
**Loki Query**:
|
| 119 |
+
```loki
|
| 120 |
+
{job="ai-me"} | json
|
| 121 |
+
| line_format "{{.latency_ms}}"
|
| 122 |
+
| __error__=""
|
| 123 |
+
| stats count() as total, count(latency_ms < 5000) as under_5s by session_id
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
**Measurement**:
|
| 127 |
+
- ✅ **PASS**: ≥95% of responses have latency < 5000ms
|
| 128 |
+
- ⚠️ **WARN**: 90-95% under 5s (acceptable but monitor)
|
| 129 |
+
- ❌ **FAIL**: <90% under 5s
|
| 130 |
+
|
| 131 |
+
**Interpretation**:
|
| 132 |
+
- Latency includes full round-trip: RAG retrieval + LLM inference + output
|
| 133 |
+
- Some variation expected due to document complexity and network
|
| 134 |
+
- Latency >5s may indicate vectorstore performance or LLM queue issues
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## SC-006: Concurrent User Support
|
| 139 |
+
|
| 140 |
+
**Definition**: System supports ≥10 concurrent user sessions with consistent performance.
|
| 141 |
+
|
| 142 |
+
**Loki Query**:
|
| 143 |
+
```loki
|
| 144 |
+
{job="ai-me"} | json
|
| 145 |
+
| stats count(distinct(session_id)) as concurrent_sessions
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
**Measurement**:
|
| 149 |
+
- ✅ **PASS**: Peak concurrent_sessions ≥10 with no degradation
|
| 150 |
+
- ⚠️ **WARN**: Peak 5-10 concurrent sessions, latency increases <20%
|
| 151 |
+
- ❌ **FAIL**: <5 concurrent sessions OR latency spikes >20% at peak
|
| 152 |
+
|
| 153 |
+
**Interpretation**:
|
| 154 |
+
- This query returns the number of unique session_ids in recent logs
|
| 155 |
+
- Run across a time window (e.g., last 1 hour) to find peak concurrency
|
| 156 |
+
- Compare latencies at low concurrency vs peak concurrency to measure degradation
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## SC-007: Error Handling Quality
|
| 161 |
+
|
| 162 |
+
**Definition**: 100% of error messages are user-friendly with zero Python tracebacks exposed.
|
| 163 |
+
|
| 164 |
+
**Loki Query**:
|
| 165 |
+
```loki
|
| 166 |
+
{job="ai-me"} | json error_type!=""
|
| 167 |
+
| line_format "{{.message}}"
|
| 168 |
+
| regex "(?i)(traceback|File \".*\"|NameError|TypeError|ValueError|Traceback \(most)"
|
| 169 |
+
| stats count() as python_errors
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
**Measurement**:
|
| 173 |
+
- ✅ **PASS**: python_errors == 0 (no raw Python tracebacks in logs)
|
| 174 |
+
- ❌ **FAIL**: python_errors > 0
|
| 175 |
+
|
| 176 |
+
**Loki Query** (Alternative - Positive):
|
| 177 |
+
```loki
|
| 178 |
+
{job="ai-me", error_type="tool_failure"} | json
|
| 179 |
+
| line_format "{{.message}}"
|
| 180 |
+
| regex "(?i)(sorry|unfortunately|unable|can't|i'm not able|let me help|instead)"
|
| 181 |
+
| stats count() as friendly_errors
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
**Measurement**:
|
| 185 |
+
- ✅ **PASS**: friendly_errors ≥95% of all error_type="tool_failure" logs
|
| 186 |
+
- ❌ **FAIL**: friendly_errors <90%
|
| 187 |
+
|
| 188 |
+
**Interpretation**:
|
| 189 |
+
- Errors are caught and formatted before reaching logs
|
| 190 |
+
- User-friendly errors use conversational language
|
| 191 |
+
- No Python internals leaked in any response
|
| 192 |
+
|
| 193 |
+
---
|
| 194 |
+
|
| 195 |
+
## SC-008: Session Isolation
|
| 196 |
+
|
| 197 |
+
**Definition**: Each session maintains independent state; no cross-session data leakage.
|
| 198 |
+
|
| 199 |
+
**Loki Query** (Verification - Count Unique Memory Files):
|
| 200 |
+
```loki
|
| 201 |
+
{job="ai-me"} | json session_id!=""
|
| 202 |
+
| stats count(distinct(session_id)) as unique_sessions,
|
| 203 |
+
count(distinct(memory_file_path)) as unique_memory_files by timestamp("1h")
|
| 204 |
+
| line_format "{{.timestamp}} sessions={{.unique_sessions}} memory_files={{.unique_memory_files}}"
|
| 205 |
+
```
|
| 206 |
+
|
| 207 |
+
**Measurement**:
|
| 208 |
+
- ✅ **PASS**: unique_sessions == unique_memory_files (1:1 mapping)
|
| 209 |
+
- ❌ **FAIL**: unique_sessions != unique_memory_files (indicates sharing)
|
| 210 |
+
|
| 211 |
+
**Loki Query** (Unit Test - Concurrent Sessions):
|
| 212 |
+
```bash
|
| 213 |
+
# Run: pytest src/test.py::test_concurrent_sessions_do_not_interfere -v
|
| 214 |
+
# This test simulates 5+ concurrent queries and verifies:
|
| 215 |
+
# - Each session gets unique session_id
|
| 216 |
+
# - Memory operations don't leak between sessions
|
| 217 |
+
# - Session-scoped resources are isolated
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
**Measurement**:
|
| 221 |
+
- ✅ **PASS**: Test passes with 5+ concurrent queries completing without errors
|
| 222 |
+
- ❌ **FAIL**: Test fails or reports cross-session data access
|
| 223 |
+
|
| 224 |
+
**Interpretation**:
|
| 225 |
+
- Session isolation is partially verified by tests, partially by logs
|
| 226 |
+
- Unique memory file per session confirms isolation by design
|
| 227 |
+
- Cross-session memory leaks would show as matching memory_file_path for different session_ids
|
| 228 |
+
|
| 229 |
+
---
|
| 230 |
+
|
| 231 |
+
## Running These Queries
|
| 232 |
+
|
| 233 |
+
### In Grafana
|
| 234 |
+
|
| 235 |
+
1. Open **Grafana** → **Explore** → **Loki**
|
| 236 |
+
2. Copy one of the Loki queries from above
|
| 237 |
+
3. Adjust time range (e.g., "last 24 hours")
|
| 238 |
+
4. Click **Run Query**
|
| 239 |
+
5. Interpret results per the "Measurement" criteria
|
| 240 |
+
|
| 241 |
+
### Programmatically
|
| 242 |
+
|
| 243 |
+
```python
|
| 244 |
+
import requests
|
| 245 |
+
import json
|
| 246 |
+
|
| 247 |
+
LOKI_URL = "http://localhost:3100"
|
| 248 |
+
|
| 249 |
+
def query_loki(query: str, limit: int = 100) -> dict:
|
| 250 |
+
"""Execute a Loki query and return results."""
|
| 251 |
+
url = f"{LOKI_URL}/loki/api/v1/query_range"
|
| 252 |
+
params = {
|
| 253 |
+
"query": query,
|
| 254 |
+
"start": int(time.time()) - 86400, # Last 24 hours
|
| 255 |
+
"end": int(time.time()),
|
| 256 |
+
"limit": limit,
|
| 257 |
+
}
|
| 258 |
+
resp = requests.get(url, params=params)
|
| 259 |
+
return resp.json()
|
| 260 |
+
|
| 261 |
+
# Example
|
| 262 |
+
results = query_loki('{job="ai-me"} | json | stats count() by session_id')
|
| 263 |
+
print(json.dumps(results, indent=2))
|
| 264 |
+
```
|
| 265 |
+
|
| 266 |
+
---
|
| 267 |
+
|
| 268 |
+
## Dashboard Recommendation
|
| 269 |
+
|
| 270 |
+
Create a Grafana dashboard with these 8 panels:
|
| 271 |
+
|
| 272 |
+
| Panel | Query | Description |
|
| 273 |
+
|-------|-------|-------------|
|
| 274 |
+
| SC-001 | First-person regex | % responses maintaining persona |
|
| 275 |
+
| SC-002 | Source attribution regex | % responses with citations |
|
| 276 |
+
| SC-003 | Substantive + gaps | % in-scope or gap-acknowledged |
|
| 277 |
+
| SC-004 | Gap consistency | Consistency of gap messages |
|
| 278 |
+
| SC-005 | Latency histogram | Response time distribution |
|
| 279 |
+
| SC-006 | Concurrent sessions | Peak concurrent user count |
|
| 280 |
+
| SC-007 | Python error count | Raw traceback count (target: 0) |
|
| 281 |
+
| SC-008 | Session isolation | 1:1 session:memory_file ratio |
|
| 282 |
+
|
| 283 |
+
---
|
| 284 |
+
|
| 285 |
+
## Interpretation Guide
|
| 286 |
+
|
| 287 |
+
**Green (Healthy)**:
|
| 288 |
+
- SC-001: >95% first-person
|
| 289 |
+
- SC-002: ≥95% attributed
|
| 290 |
+
- SC-003: ≥90% substantive or gap-acknowledged
|
| 291 |
+
- SC-004: 100% gap responses consistent
|
| 292 |
+
- SC-005: ≥95% under 5 seconds
|
| 293 |
+
- SC-006: ≥10 concurrent sessions
|
| 294 |
+
- SC-007: 0 Python errors
|
| 295 |
+
- SC-008: 1:1 session:memory_file ratio
|
| 296 |
+
|
| 297 |
+
**Yellow (Degrading)**:
|
| 298 |
+
- SC-001: 90-95% first-person
|
| 299 |
+
- SC-002: 90-95% attributed
|
| 300 |
+
- SC-003: 85-90% substantive/gap
|
| 301 |
+
- SC-004: 95-100% gap consistency
|
| 302 |
+
- SC-005: 90-95% under 5s
|
| 303 |
+
- SC-006: 5-10 concurrent sessions
|
| 304 |
+
- SC-007: <1% Python errors
|
| 305 |
+
- SC-008: Occasional mismatches (acceptable)
|
| 306 |
+
|
| 307 |
+
**Red (Critical)**:
|
| 308 |
+
- SC-001: <90% first-person
|
| 309 |
+
- SC-002: <90% attributed
|
| 310 |
+
- SC-003: <85% substantive/gap
|
| 311 |
+
- SC-004: <95% gap consistency
|
| 312 |
+
- SC-005: <90% under 5s
|
| 313 |
+
- SC-006: <5 concurrent sessions
|
| 314 |
+
- SC-007: >1% Python errors
|
| 315 |
+
- SC-008: Widespread session cross-talk
|
| 316 |
+
|
| 317 |
+
---
|
| 318 |
+
|
| 319 |
+
## Notes
|
| 320 |
+
|
| 321 |
+
- Queries assume logs are structured JSON with fields: `session_id`, `message`, `latency_ms`, `error_type`, `memory_file_path`
|
| 322 |
+
- Timestamps in queries use UTC
|
| 323 |
+
- Thresholds are aspirational; adjust based on deployment context
|
| 324 |
+
- Some queries require manual adjustment (e.g., regex patterns for your specific error messages)
|
docs/local-testing/.gitignore
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*
|
| 2 |
+
!.gitignore
|
specs/001-personified-ai-agent/checklists/requirements.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Specification Quality Checklist: Personified AI Agent
|
| 2 |
+
|
| 3 |
+
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
| 4 |
+
**Created**: 2025-10-23
|
| 5 |
+
**Updated**: 2025-10-23 (post-clarification)
|
| 6 |
+
**Feature**: [spec.md](../spec.md)
|
| 7 |
+
|
| 8 |
+
## Content Quality
|
| 9 |
+
|
| 10 |
+
- [x] No implementation details (languages, frameworks, APIs)
|
| 11 |
+
- [x] Focused on user value and business needs
|
| 12 |
+
- [x] Written for non-technical stakeholders
|
| 13 |
+
- [x] All mandatory sections completed
|
| 14 |
+
- [x] Clarifications section added with all resolved ambiguities
|
| 15 |
+
|
| 16 |
+
## Requirement Completeness
|
| 17 |
+
|
| 18 |
+
- [x] No [NEEDS CLARIFICATION] markers remain
|
| 19 |
+
- [x] Requirements are testable and unambiguous
|
| 20 |
+
- [x] Success criteria are measurable
|
| 21 |
+
- [x] Success criteria are technology-agnostic (no implementation details)
|
| 22 |
+
- [x] All acceptance scenarios are defined
|
| 23 |
+
- [x] Edge cases are identified (including tool integration scenarios)
|
| 24 |
+
- [x] Scope is clearly bounded
|
| 25 |
+
- [x] Dependencies and assumptions identified
|
| 26 |
+
|
| 27 |
+
## Feature Readiness
|
| 28 |
+
|
| 29 |
+
- [x] All functional requirements have clear acceptance criteria
|
| 30 |
+
- [x] User scenarios cover primary flows
|
| 31 |
+
- [x] Feature meets measurable outcomes defined in Success Criteria
|
| 32 |
+
- [x] No implementation details leak into specification
|
| 33 |
+
- [x] Tool integration requirements explicit and testable
|
| 34 |
+
- [x] Conflict resolution strategy defined
|
| 35 |
+
|
| 36 |
+
## Validation Results
|
| 37 |
+
|
| 38 |
+
✅ **All items pass** - Specification is complete, clarified, and ready for `/speckit.plan`
|
| 39 |
+
|
| 40 |
+
### Clarification Summary
|
| 41 |
+
|
| 42 |
+
**5 questions asked and answered:**
|
| 43 |
+
1. Knowledge base configuration → Admin-configurable markdown in GitHub
|
| 44 |
+
2. External tool integration → Time & Memory mandatory; GitHub & LinkedIn conditional
|
| 45 |
+
3. Conflicting documentation → Prioritize by search score; log for review
|
| 46 |
+
4. Tool failure handling → User-friendly errors; halt until recovery
|
| 47 |
+
5. Memory scope → Session-scoped user attributes; resets between sessions
|
| 48 |
+
|
| 49 |
+
### Updated Sections
|
| 50 |
+
|
| 51 |
+
- ✅ Added Clarifications section with all Q&A
|
| 52 |
+
- ✅ Updated Functional Requirements (FR-009 through FR-013 added for tools)
|
| 53 |
+
- ✅ Updated Key Entities (added ToolConfiguration, UserAttributes, ConflictLog)
|
| 54 |
+
- ✅ Added Tool Integration & Failure Handling subsection
|
| 55 |
+
- ✅ Updated Success Criteria (SC-007, SC-008 added for tool resilience & memory)
|
| 56 |
+
- ✅ Updated Assumptions (8 items covering all clarified areas)
|
| 57 |
+
|
| 58 |
+
### Notes
|
| 59 |
+
|
| 60 |
+
Specification now includes:
|
| 61 |
+
- Explicit tool integration strategy with conditional activation
|
| 62 |
+
- Clear failure handling and user-friendly error messaging
|
| 63 |
+
- Conflict resolution process for contradictory documentation
|
| 64 |
+
- Session-scoped memory model for user personalization
|
| 65 |
+
- All 13 functional requirements (vs 8 before clarification)
|
| 66 |
+
- 8 measurable success criteria (vs 6 before)
|
| 67 |
+
- 8 detailed assumptions (vs 5 before)
|
| 68 |
+
|
| 69 |
+
**Ready to proceed**: `/speckit.plan` to create technical implementation strategy
|
| 70 |
+
|
specs/001-personified-ai-agent/gap-analysis.md
ADDED
|
@@ -0,0 +1,623 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gap Analysis: Specification vs. Existing Implementation
|
| 2 |
+
|
| 3 |
+
**Date**: 2025-10-23
|
| 4 |
+
**Spec**: `001-personified-ai-agent/spec.md`
|
| 5 |
+
**Implementation**: `src/agent.py`, `src/app.py`, `src/data.py`, `src/config.py`, `src/test.py`
|
| 6 |
+
**Status**: ✅ Comprehensive Analysis
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Executive Summary
|
| 11 |
+
|
| 12 |
+
| Category | Status | Gap Type | Severity |
|
| 13 |
+
|----------|--------|----------|----------|
|
| 14 |
+
| **Core Chat Interface** | ✅ Implemented | None | — |
|
| 15 |
+
| **RAG & Knowledge Base** | ✅ Implemented | Partial (LinkedIn missing) | Low |
|
| 16 |
+
| **Session Isolation** | ✅ Implemented | None | — |
|
| 17 |
+
| **Tool Integration (Time, Memory, GitHub)** | ✅ Implemented | None | — |
|
| 18 |
+
| **Output Normalization** | ✅ Implemented | None | — |
|
| 19 |
+
| **Source Attribution** | ✅ Verified (Test 7) | None | — |
|
| 20 |
+
| **Conflict Logging** | ⚠️ Partially Implemented | Missing specific conflict tracking | Medium |
|
| 21 |
+
| **LinkedIn Tool** | ❌ Not Implemented | Missing optional tool | Low |
|
| 22 |
+
| **Error Handling for Tool Failures** | ⚠️ Partially Implemented | Needs robust user-friendly messages | Medium |
|
| 23 |
+
| **Success Metrics** | ❌ Not Implemented | No measurement framework | High |
|
| 24 |
+
|
| 25 |
+
**Overall**: ~85% specification compliance with working implementation. Implementation exceeds specification in robustness but lacks formal measurement and documentation of some features.
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 1. Chat Interface (FR-001)
|
| 30 |
+
|
| 31 |
+
### Specification Requirement
|
| 32 |
+
- System MUST provide a chat interface where users can send messages and receive responses
|
| 33 |
+
|
| 34 |
+
### Implementation Status
|
| 35 |
+
✅ **IMPLEMENTED & EXCEEDS SPEC**
|
| 36 |
+
|
| 37 |
+
**Evidence**:
|
| 38 |
+
- `src/app.py` (lines 80-121): Gradio Blocks interface with chat history
|
| 39 |
+
- Custom CSS styling (`src/static/style.css`)
|
| 40 |
+
- Custom JavaScript for scroll behavior (`src/static/scroll.js`)
|
| 41 |
+
- Markdown rendering for welcome message
|
| 42 |
+
|
| 43 |
+
**Gaps**: None
|
| 44 |
+
|
| 45 |
+
**Beyond Spec**:
|
| 46 |
+
- Custom theming and UI customization
|
| 47 |
+
- Auto-scroll behavior for chat messages
|
| 48 |
+
- Responsive design
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## 2. Knowledge Base Management (FR-002)
|
| 53 |
+
|
| 54 |
+
### Specification Requirement
|
| 55 |
+
- System MUST retrieve relevant information from person's knowledge base (admin-configurable markdown files in public GitHub repository)
|
| 56 |
+
|
| 57 |
+
### Implementation Status
|
| 58 |
+
✅ **IMPLEMENTED WITH MINOR GAPS**
|
| 59 |
+
|
| 60 |
+
**Evidence**:
|
| 61 |
+
- `src/data.py`: `DataManager` class with complete pipeline
|
| 62 |
+
- `load_local_documents()` (lines 60-94): Loads markdown from local `docs/` directory
|
| 63 |
+
- `load_github_documents()` (lines 96-176): Loads from GitHub repos via GitLoader
|
| 64 |
+
- URL rewriting for GitHub-sourced documents (lines 178-220)
|
| 65 |
+
- Two-stage intelligent chunking (lines 222-280)
|
| 66 |
+
- `src/config.py`: `DataManagerConfig` with configurable document loading patterns
|
| 67 |
+
- `src/app.py` (lines 17-19): `DataManager` initialization with config
|
| 68 |
+
|
| 69 |
+
**Gaps**:
|
| 70 |
+
1. **Documentation Configuration**: Currently loads from hardcoded `docs/` directory and `config.github_repos`. While flexible, could expose admin configuration UI for runtime changes (currently requires code/env changes)
|
| 71 |
+
2. **GitHub Repository**: Works but documentation doesn't explicitly state which repos are loaded or how to configure them at runtime
|
| 72 |
+
|
| 73 |
+
**Beyond Spec**:
|
| 74 |
+
- Two-stage chunking (header-aware + size-based) for better document structure preservation
|
| 75 |
+
- GitHub URL rewriting for relative link resolution
|
| 76 |
+
- Error handling for missing directories
|
| 77 |
+
- Pattern-based glob loading
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## 3. First-Person Persona (FR-003)
|
| 82 |
+
|
| 83 |
+
### Specification Requirement
|
| 84 |
+
- System MUST respond in first-person perspective, maintaining persona of person being represented
|
| 85 |
+
|
| 86 |
+
### Implementation Status
|
| 87 |
+
✅ **IMPLEMENTED**
|
| 88 |
+
|
| 89 |
+
**Evidence**:
|
| 90 |
+
- `src/agent.py` (lines 140-180): System prompt enforces first-person perspective
|
| 91 |
+
- Explicit instructions: "Respond in first-person perspective as if you are {person}"
|
| 92 |
+
- Persona constraints included in prompt
|
| 93 |
+
- Relationship transparency requirement in prompt
|
| 94 |
+
|
| 95 |
+
**Gaps**: None
|
| 96 |
+
|
| 97 |
+
**Beyond Spec**:
|
| 98 |
+
- Explicit mention of employer relationships for transparency
|
| 99 |
+
- Rate limiting guidance in system prompt (max 3 GitHub calls per session)
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## 4. Source Attribution (FR-004)
|
| 104 |
+
|
| 105 |
+
### Specification Requirement
|
| 106 |
+
- System MUST reference sources for factual claims (e.g., "per my documentation on X")
|
| 107 |
+
|
| 108 |
+
### Implementation Status
|
| 109 |
+
✅ **IMPLEMENTED & VERIFIED**
|
| 110 |
+
|
| 111 |
+
**Evidence**:
|
| 112 |
+
- `src/agent.py` (lines 195-220): RAG tool returns documents with source metadata
|
| 113 |
+
- Documents include file path and chunk ID
|
| 114 |
+
- Relevance scores included in context
|
| 115 |
+
- Tool is given full retrieval context to make attribution decisions
|
| 116 |
+
- `src/test.py` (new Test 7): Verifies relative GitHub links are converted to absolute URLs
|
| 117 |
+
- ✅ PASSED: Relative links `/resume.md` → `https://github.com/owner/repo/blob/main/resume.md`
|
| 118 |
+
- Validates that source attribution preserves GitHub URL references
|
| 119 |
+
|
| 120 |
+
**Gaps**: None identified
|
| 121 |
+
|
| 122 |
+
**Test Coverage**: Test 7 validates that GitHub-sourced documents maintain proper URL attribution
|
| 123 |
+
|
| 124 |
+
**Recommendation**: ✅ Marked complete. Test 7 verifies that source information is preserved and accessible for attribution.
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## 5. Conversation History (FR-005)
|
| 129 |
+
|
| 130 |
+
### Specification Requirement
|
| 131 |
+
- System MUST maintain conversation history within a single session
|
| 132 |
+
|
| 133 |
+
### Implementation Status
|
| 134 |
+
✅ **IMPLEMENTED**
|
| 135 |
+
|
| 136 |
+
**Evidence**:
|
| 137 |
+
- `src/agent.py` (lines 313-333): `create_ai_me_agent()` initializes agent with conversation memory
|
| 138 |
+
- Uses OpenAI Agents SDK which handles message history
|
| 139 |
+
- Session_id tagged on all logs for context
|
| 140 |
+
- `src/app.py` (lines 77-83): Session-scoped agent storage ensures per-user history isolation
|
| 141 |
+
|
| 142 |
+
**Gaps**: None
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## 6. Knowledge Gap Handling (FR-006)
|
| 147 |
+
|
| 148 |
+
### Specification Requirement
|
| 149 |
+
- System MUST handle cases where knowledge base doesn't contain an answer by gracefully indicating knowledge gaps
|
| 150 |
+
|
| 151 |
+
### Implementation Status
|
| 152 |
+
✅ **IMPLEMENTED**
|
| 153 |
+
|
| 154 |
+
**Evidence**:
|
| 155 |
+
- `src/agent.py` (lines 195-220): RAG tool design
|
| 156 |
+
- Returns empty/low-scoring results when no matching documents found
|
| 157 |
+
- Prompt instructs agent on graceful knowledge gap indication (lines 140-180)
|
| 158 |
+
- Edge case handling documented in acceptance scenarios
|
| 159 |
+
|
| 160 |
+
**Gaps**: None
|
| 161 |
+
|
| 162 |
+
---
|
| 163 |
+
|
| 164 |
+
## 7. Session Isolation (FR-007)
|
| 165 |
+
|
| 166 |
+
### Specification Requirement
|
| 167 |
+
- System MUST support conversation threads/sessions isolated from other users
|
| 168 |
+
|
| 169 |
+
### Implementation Status
|
| 170 |
+
✅ **IMPLEMENTED**
|
| 171 |
+
|
| 172 |
+
**Evidence**:
|
| 173 |
+
- `src/app.py`:
|
| 174 |
+
- Per-session agent storage: `session_agents = {}` (line 21)
|
| 175 |
+
- `initialize_session()` (lines 24-51): Creates new agent per session_id
|
| 176 |
+
- Session ID from Gradio: `session_hash` (line 62)
|
| 177 |
+
- MCP servers per session (line 40): Each session gets own Memory server instance
|
| 178 |
+
|
| 179 |
+
**Gaps**: None
|
| 180 |
+
|
| 181 |
+
**Beyond Spec**:
|
| 182 |
+
- Explicit cleanup would be beneficial (not currently implemented) but not required by spec
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
+
## 8. Output Normalization (FR-008)
|
| 187 |
+
|
| 188 |
+
### Specification Requirement
|
| 189 |
+
- System MUST normalize and clean output to ensure consistent, readable responses across platforms
|
| 190 |
+
|
| 191 |
+
### Implementation Status
|
| 192 |
+
✅ **IMPLEMENTED**
|
| 193 |
+
|
| 194 |
+
**Evidence**:
|
| 195 |
+
- `src/agent.py` (lines 14-28): Unicode normalization translation table
|
| 196 |
+
- Handles non-breaking spaces, smart quotes, brackets, dashes
|
| 197 |
+
- `normalize_output()` method applies table to all responses
|
| 198 |
+
- Called before returning to user
|
| 199 |
+
|
| 200 |
+
**Gaps**: None
|
| 201 |
+
|
| 202 |
+
**Beyond Spec**:
|
| 203 |
+
- Comprehensive Unicode handling for global platform compatibility
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## 9. Mandatory Tools: Time (FR-009-Time)
|
| 208 |
+
|
| 209 |
+
### Specification Requirement
|
| 210 |
+
- System MUST include Time tool (current date/time) - mandatory/always-on
|
| 211 |
+
|
| 212 |
+
### Implementation Status
|
| 213 |
+
✅ **IMPLEMENTED**
|
| 214 |
+
|
| 215 |
+
**Evidence**:
|
| 216 |
+
- `src/agent.py` (lines 120-128): `mcp_time_params` property
|
| 217 |
+
- Always included in MCP servers list (line 40 in app.py)
|
| 218 |
+
- No environment variable requirement
|
| 219 |
+
|
| 220 |
+
**Gaps**: None
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
+
## 10. Mandatory Tools: Memory (FR-009-Memory)
|
| 225 |
+
|
| 226 |
+
### Specification Requirement
|
| 227 |
+
- System MUST include Memory tool (session-scoped user attribute tracking) - mandatory/always-on
|
| 228 |
+
- MUST track: name, profession, interests, hobbies
|
| 229 |
+
- MUST reset between sessions
|
| 230 |
+
|
| 231 |
+
### Implementation Status
|
| 232 |
+
✅ **IMPLEMENTED**
|
| 233 |
+
|
| 234 |
+
**Evidence**:
|
| 235 |
+
- `src/agent.py` (lines 129-155): `get_mcp_memory_params()` method
|
| 236 |
+
- Uses MCP memory server for session-persistent knowledge graph
|
| 237 |
+
- Session ID in file path ensures per-session isolation (line 143)
|
| 238 |
+
- Memory file created fresh for each session (line 142)
|
| 239 |
+
- `src/app.py` (line 40): Memory server included in all sessions
|
| 240 |
+
|
| 241 |
+
**Gaps**: None
|
| 242 |
+
|
| 243 |
+
**Beyond Spec**:
|
| 244 |
+
- Uses knowledge graph model (entities + relationships) vs simple key-value
|
| 245 |
+
- Enables more sophisticated user tracking
|
| 246 |
+
|
| 247 |
+
---
|
| 248 |
+
|
| 249 |
+
## 11. Optional Tools: GitHub (FR-010-GitHub)
|
| 250 |
+
|
| 251 |
+
### Specification Requirement
|
| 252 |
+
- System MUST support GitHub tool (activated if GitHub PAT environment variable set)
|
| 253 |
+
- Should gracefully remain inactive without credentials
|
| 254 |
+
|
| 255 |
+
### Implementation Status
|
| 256 |
+
✅ **IMPLEMENTED**
|
| 257 |
+
|
| 258 |
+
**Evidence**:
|
| 259 |
+
- `src/agent.py` (lines 78-115): `mcp_github_params` property
|
| 260 |
+
- Conditional on `github_token` (line 82)
|
| 261 |
+
- Uses official GitHub MCP server binary
|
| 262 |
+
- Read-only mode with limited toolset (lines 107-109)
|
| 263 |
+
- Falls back to production path if test binary not found (lines 101-103)
|
| 264 |
+
- `src/config.py` (line 150): `github_token` loaded from environment as SecretStr
|
| 265 |
+
- `src/app.py` (line 39): Conditional inclusion based on token presence
|
| 266 |
+
|
| 267 |
+
**Gaps**: None
|
| 268 |
+
|
| 269 |
+
**Beyond Spec**:
|
| 270 |
+
- Uses official GitHub MCP server maintained by GitHub
|
| 271 |
+
- Read-only mode enforces safety
|
| 272 |
+
- Rate limiting guidance in prompt
|
| 273 |
+
|
| 274 |
+
---
|
| 275 |
+
|
| 276 |
+
## 12. Optional Tools: LinkedIn (FR-010-LinkedIn)
|
| 277 |
+
|
| 278 |
+
### Specification Requirement
|
| 279 |
+
- System MUST support LinkedIn tool (activated if LinkedIn API token environment variable set)
|
| 280 |
+
|
| 281 |
+
### Implementation Status
|
| 282 |
+
❌ **NOT IMPLEMENTED**
|
| 283 |
+
|
| 284 |
+
**Evidence**:
|
| 285 |
+
- No LinkedIn tool configuration in `src/agent.py`
|
| 286 |
+
- No LinkedIn environment variable in `src/config.py`
|
| 287 |
+
- No LinkedIn MCP server reference
|
| 288 |
+
|
| 289 |
+
**Gaps**:
|
| 290 |
+
1. **Complete Gap**: LinkedIn tool is specified as optional but not implemented
|
| 291 |
+
2. **No LinkedIn MCP Server Integration**: Would require new MCP server (not standard with OpenAI Agents SDK)
|
| 292 |
+
3. **Configuration**: No environment variable handling
|
| 293 |
+
|
| 294 |
+
**Severity**: Low (specified as optional)
|
| 295 |
+
|
| 296 |
+
**Implementation Path**:
|
| 297 |
+
- Research LinkedIn MCP server availability
|
| 298 |
+
- If not available, could implement via LinkedIn API integration
|
| 299 |
+
- Add `linkedin_api_token: Optional[SecretStr]` to config
|
| 300 |
+
- Add conditional LinkedIn MCP params similar to GitHub
|
| 301 |
+
|
| 302 |
+
---
|
| 303 |
+
|
| 304 |
+
## 13. Conflict Resolution (FR-011)
|
| 305 |
+
|
| 306 |
+
### Specification Requirement
|
| 307 |
+
- System MUST prioritize conflicting documentation by vector search relevance score
|
| 308 |
+
- System MUST log conflicts for human review post-session
|
| 309 |
+
|
| 310 |
+
### Implementation Status
|
| 311 |
+
✅ **IMPLEMENTED (Pragmatic Approach)**
|
| 312 |
+
|
| 313 |
+
**Evidence**:
|
| 314 |
+
- `src/data.py` (lines 281-320): Vector search returns relevance scores
|
| 315 |
+
- `src/agent.py` (lines 195-220): RAG tool receives scores
|
| 316 |
+
- `src/config.py`: Optional Grafana Loki integration for remote logging
|
| 317 |
+
- Approach: Pragmatic semantic conflict detection via LLM
|
| 318 |
+
|
| 319 |
+
**Design Decision** (User-Specified):
|
| 320 |
+
Since conflict detection is fundamentally a semantic problem (requires understanding whether two chunks contradict each other), the implementation uses:
|
| 321 |
+
|
| 322 |
+
1. **LLM-Based Detection**: Prompt engineering instructs agent to acknowledge uncertainty when encountering conflicting information:
|
| 323 |
+
- System prompt includes guidance: "If you find contradictory information, acknowledge both and note you're uncertain"
|
| 324 |
+
- Agent naturally flags conflicts with phrases like "I'm not sure..." or "I found conflicting information..."
|
| 325 |
+
|
| 326 |
+
2. **Structured Logging to Loki**: All user/agent interactions already logged to Grafana Loki
|
| 327 |
+
- Session context included on all logs
|
| 328 |
+
- Can search for conflict signals using Loki query filters
|
| 329 |
+
|
| 330 |
+
3. **Reproducible Query**: Saved query for detecting reported conflicts:
|
| 331 |
+
|
| 332 |
+
```sql
|
| 333 |
+
# Loki Query: Find all agent responses indicating uncertainty about conflicting information
|
| 334 |
+
{job="ai-me"}
|
| 335 |
+
| json
|
| 336 |
+
| message =~ "(?i)(conflicting|contradict|not sure|unclear|uncertain|conflicting information)"
|
| 337 |
+
| session_id != ""
|
| 338 |
+
| line_format "{{.timestamp}} [{{.session_id}}] {{.message}}"
|
| 339 |
+
```
|
| 340 |
+
|
| 341 |
+
**Gaps**: None - conflict detection fully integrated via:
|
| 342 |
+
- ✅ LLM semantic understanding (pragmatic detection)
|
| 343 |
+
- ✅ Structured logging (session context preserved)
|
| 344 |
+
- ✅ Loki query for reproducible analysis
|
| 345 |
+
- ✅ Human-reviewable via query results
|
| 346 |
+
|
| 347 |
+
**Beyond Spec**:
|
| 348 |
+
- Automatic logging via Loki (not manual storage)
|
| 349 |
+
- LLM handles semantic analysis (vs. explicit contradiction detection)
|
| 350 |
+
- Query-based discovery (vs. dedicated ConflictLog table)
|
| 351 |
+
|
| 352 |
+
**Recommendation**: ✅ Accept current design. FR-011 is satisfied through:
|
| 353 |
+
1. Vector search prioritization (relevance scores used by RAG tool)
|
| 354 |
+
2. LLM semantic conflict detection (agent prompted to acknowledge uncertainty)
|
| 355 |
+
3. Loki logging query (human-reviewable, reproducible analysis)
|
| 356 |
+
|
| 357 |
+
---
|
| 358 |
+
|
| 359 |
+
## 14. Tool Failure Handling (FR-012)
|
| 360 |
+
|
| 361 |
+
### Specification Requirement
|
| 362 |
+
- When external tools fail, system MUST return user-friendly error messages
|
| 363 |
+
- System MUST wait for tool recovery before processing further queries
|
| 364 |
+
|
| 365 |
+
### Implementation Status
|
| 366 |
+
✅ **IMPLEMENTED & VERIFIED**
|
| 367 |
+
|
| 368 |
+
**Evidence**:
|
| 369 |
+
- `src/agent.py` (lines 340-365): Comprehensive try/catch blocks in agent execution
|
| 370 |
+
- Catch-all exception handler (lines 377-384) traps all tool failures
|
| 371 |
+
- User-friendly error messages returned to chat UI (not just logs)
|
| 372 |
+
- Session context logged to Loki for analysis/alerting
|
| 373 |
+
- Error responses formatted for Gradio chat interface
|
| 374 |
+
|
| 375 |
+
**Design Pattern** (Catch-All + User Message + Loki Log):
|
| 376 |
+
```python
|
| 377 |
+
# Simplified flow:
|
| 378 |
+
try:
|
| 379 |
+
# Execute agent with MCP tools
|
| 380 |
+
result = await agent.run(...)
|
| 381 |
+
except Exception as e:
|
| 382 |
+
# User-friendly message returned to chat
|
| 383 |
+
user_message = friendly_error_message(e)
|
| 384 |
+
# Technical details logged to Loki for analysis
|
| 385 |
+
logger.error(f"Tool failure: {e}", extra={"session_id": session_id})
|
| 386 |
+
return user_message # User sees friendly version
|
| 387 |
+
```
|
| 388 |
+
|
| 389 |
+
**Benefits**:
|
| 390 |
+
- ✅ Users see helpful messages ("I'm having trouble accessing GitHub, but I can still help...")
|
| 391 |
+
- ✅ Developers see full error details in Loki for debugging
|
| 392 |
+
- ✅ Tool failures don't crash the application (graceful degradation)
|
| 393 |
+
- ✅ Sessions continue - optional tool failures don't halt conversation
|
| 394 |
+
|
| 395 |
+
**Tool Failure Handling**:
|
| 396 |
+
- **Mandatory tools** (Time, Memory): Failures logged; agent continues with available tools
|
| 397 |
+
- **Optional tools** (GitHub, LinkedIn): Failures logged; agent falls back to local docs
|
| 398 |
+
- **Rate limits**: Detected and user notified with estimated recovery time
|
| 399 |
+
|
| 400 |
+
**Gaps**: None - specification fully satisfied by:
|
| 401 |
+
- ✅ Catch-all exception handler (all failures caught)
|
| 402 |
+
- ✅ User-friendly error messages (chat-appropriate formatting)
|
| 403 |
+
- ✅ Loki logging (technical details for debugging/alerting)
|
| 404 |
+
- ✅ Graceful degradation (conversation continues)
|
| 405 |
+
|
| 406 |
+
**Recommendation**: ✅ Accept current design. FR-012 is satisfied through pragmatic error handling pattern.
|
| 407 |
+
|
| 408 |
+
---
|
| 409 |
+
|
| 410 |
+
## 15. Success Metrics (SC-001 through SC-008)
|
| 411 |
+
|
| 412 |
+
### Specification Requirement
|
| 413 |
+
- 8 measurable success criteria including persona consistency, accuracy, response time, etc.
|
| 414 |
+
|
| 415 |
+
### Implementation Status
|
| 416 |
+
✅ **FRAMEWORK IMPLEMENTED - LOKI QUERIES TODO**
|
| 417 |
+
|
| 418 |
+
**Evidence**:
|
| 419 |
+
- ✅ Grafana Loki integration configured in `src/config.py` (lines 70-90)
|
| 420 |
+
- ✅ All agent interactions logged with session context (src/agent.py, lines 365-390)
|
| 421 |
+
- ✅ Structured JSON logging enables metric extraction
|
| 422 |
+
- ❌ Specific Loki queries for each success criterion not yet created
|
| 423 |
+
|
| 424 |
+
**Metrics Measurement Strategy** (Loki Query-Based):
|
| 425 |
+
|
| 426 |
+
All success criteria can be measured by analyzing logs stored in Grafana Loki:
|
| 427 |
+
|
| 428 |
+
| SC | Metric | Measurement | Loki Query Pattern |
|
| 429 |
+
|-----|--------|-------------|-------------------|
|
| 430 |
+
| **SC-001** | Persona consistency (80% target) | User survey + log analysis | Messages with first-person language; user feedback |
|
| 431 |
+
| **SC-002** | Factual accuracy (100% from KB) | Source validation + log review | agent_output contains source attribution; compare against knowledge base |
|
| 432 |
+
| **SC-003** | In-scope Q answer rate (90%) | Query classification | Queries matching knowledge base topics get substantive responses |
|
| 433 |
+
| **SC-004** | Knowledge gap acknowledgment (100%) | Response classification | Out-of-scope queries trigger "I don't have..." pattern |
|
| 434 |
+
| **SC-005** | Response latency (< 5 sec) | Timing metrics | Duration between user_input and agent_output logs |
|
| 435 |
+
| **SC-006** | Concurrent users (10+) | Session count | Unique session_id values active simultaneously |
|
| 436 |
+
| **SC-007** | Tool failure handling (100% friendly) | Error classification | Exception logs contain user-friendly message, not traceback |
|
| 437 |
+
| **SC-008** | Memory personalization | Interaction progression | Session logs show increasing personalization as conversation progresses |
|
| 438 |
+
|
| 439 |
+
**Design Pattern** (Loki Query Structure):
|
| 440 |
+
```
|
| 441 |
+
# Example: Measure SC-005 (latency under 5 seconds)
|
| 442 |
+
{job="ai-me"}
|
| 443 |
+
| json
|
| 444 |
+
| timestamp_diff=__end__ - __start__
|
| 445 |
+
| timestamp_diff < 5000
|
| 446 |
+
| stats count() as fast_responses, count() as total_responses
|
| 447 |
+
```
|
| 448 |
+
|
| 449 |
+
**Gaps Remaining**:
|
| 450 |
+
- ❌ Specific Loki queries not yet created (TO DO - Phase 2)
|
| 451 |
+
- ❌ Dashboard template for non-technical stakeholders (TO DO - Phase 2)
|
| 452 |
+
- ❌ User survey mechanism for SC-001, SC-008 (TO DO - Post-launch)
|
| 453 |
+
|
| 454 |
+
**Next Steps**:
|
| 455 |
+
1. **Phase 1 (Now)**: Accept current log architecture as measurement foundation
|
| 456 |
+
2. **Phase 2 (Later)**: Create specific Loki queries for SC-001 through SC-008
|
| 457 |
+
3. **Phase 3 (Post-launch)**: Deploy dashboard; collect user survey feedback
|
| 458 |
+
|
| 459 |
+
**Recommendation**: ✅ Accept Loki-based measurement framework as implementation. Create detailed Loki queries later per deployment phase. Framework is already in place; just need queries to make metrics discoverable for others reproducing the agent.
|
| 460 |
+
|
| 461 |
+
**Benefits of This Approach**:
|
| 462 |
+
- ✅ No code changes needed (logging already exists)
|
| 463 |
+
- ✅ Reproducible (queries can be shared with others running own agent)
|
| 464 |
+
- ✅ Flexible (queries can be customized per deployment)
|
| 465 |
+
- ✅ Observable (centralized logging for troubleshooting)
|
| 466 |
+
- ✅ Constitution-aligned (Principle VII: Observability First)
|
| 467 |
+
|
| 468 |
+
---
|
| 469 |
+
|
| 470 |
+
## 16. Implementation Strengths (Beyond Spec)
|
| 471 |
+
|
| 472 |
+
### 1. **Async/Await Throughout**
|
| 473 |
+
- All I/O operations properly async (meets constitution requirement)
|
| 474 |
+
- No blocking operations in hot path
|
| 475 |
+
|
| 476 |
+
### 2. **Structured Logging**
|
| 477 |
+
- Session-scoped context on all logs
|
| 478 |
+
- Optional Grafana Loki integration
|
| 479 |
+
- Follows constitution observability requirements
|
| 480 |
+
|
| 481 |
+
### 3. **Type Safety**
|
| 482 |
+
- Pydantic validation for all configuration
|
| 483 |
+
- SecretStr for sensitive values
|
| 484 |
+
- Clear contract via BaseModel definitions
|
| 485 |
+
|
| 486 |
+
### 4. **Intelligent Chunking**
|
| 487 |
+
- Two-stage chunking preserves document structure
|
| 488 |
+
- Header-aware splitting for better semantics
|
| 489 |
+
- Size-based fallback for large sections
|
| 490 |
+
|
| 491 |
+
### 5. **Error Recovery**
|
| 492 |
+
- Continues on individual document failures
|
| 493 |
+
- Per-source error handling
|
| 494 |
+
- Graceful degradation when sources unavailable
|
| 495 |
+
|
| 496 |
+
### 6. **Constitution Alignment**
|
| 497 |
+
- ✅ Async-First Architecture
|
| 498 |
+
- ✅ RAG-First Data Pipeline
|
| 499 |
+
- ✅ Type-Safe Configuration
|
| 500 |
+
- ✅ Session Isolation & Resource Management
|
| 501 |
+
- ✅ Observability & Logging
|
| 502 |
+
- ✅ Output Cleanliness (Unicode Normalization)
|
| 503 |
+
- ✅ Persona Consistency
|
| 504 |
+
|
| 505 |
+
---
|
| 506 |
+
|
| 507 |
+
## 17. Gap Analysis Summary (Post-Review)
|
| 508 |
+
|
| 509 |
+
### All Gaps Reviewed & Resolved
|
| 510 |
+
|
| 511 |
+
| Gap # | Specification | Current Status | Resolution |
|
| 512 |
+
|-------|---------------|----------------|-----------|
|
| 513 |
+
| Gap #1 | FR-004: Source Attribution | ✅ VERIFIED | GitHub URL rewriting works; Test 7 validates |
|
| 514 |
+
| Gap #2 | FR-011: Conflict Resolution | ✅ IMPLEMENTED | Pragmatic LLM approach + Loki logging + reproducible query |
|
| 515 |
+
| Gap #3 | FR-012: Tool Failure Handling | ✅ IMPLEMENTED | Catch-all exception handler + user-friendly messages + Loki logs |
|
| 516 |
+
| Gap #4 | FR-010: LinkedIn Tool | ✅ DEFERRED | Optional feature; moved to Phase B (will use /speckit.specify) |
|
| 517 |
+
| Gap #5 | SC-001-SC-008: Success Metrics | ✅ FRAMEWORK READY | Loki integration complete; queries TODO in Phase 2 |
|
| 518 |
+
|
| 519 |
+
### Overall Compliance Score
|
| 520 |
+
|
| 521 |
+
**85% → 95%** (post-gap-review)
|
| 522 |
+
|
| 523 |
+
**Improvements**:
|
| 524 |
+
- ✅ FR-004: Verified with Test 7 (relative GitHub links work)
|
| 525 |
+
- ✅ FR-011: Designed pragmatic conflict detection (LLM + Loki)
|
| 526 |
+
- ✅ FR-012: Verified catch-all error handling (user-friendly messages)
|
| 527 |
+
- ✅ SC-001-SC-008: Loki infrastructure confirmed; queries deferred to Phase 2
|
| 528 |
+
|
| 529 |
+
**Remaining TODO**:
|
| 530 |
+
- LinkedIn tool queries → Phase B (separate /speckit.specify)
|
| 531 |
+
- Success metrics Loki queries → Phase 2 (deferred)
|
| 532 |
+
- User survey infrastructure → Post-launch feedback
|
| 533 |
+
|
| 534 |
+
---
|
| 535 |
+
|
| 536 |
+
## 18. Missing From Implementation (Updated Spec Gaps)
|
| 537 |
+
|
| 538 |
+
| Gap | Spec Requirement | Current State | Priority | Phase |
|
| 539 |
+
|-----|------------------|---------------|----------|-------|
|
| 540 |
+
| LinkedIn Tool | FR-010 (optional) | Not implemented | Low | Phase B |
|
| 541 |
+
| Success Metrics Queries | SC-001-SC-008 queries | Not written | Medium | Phase 2 |
|
| 542 |
+
| User Survey Infrastructure | SC-001, SC-008 measurement | Not implemented | Low | Post-Launch |
|
| 543 |
+
|
| 544 |
+
---
|
| 545 |
+
|
| 546 |
+
## 19. Beyond-Spec Strengths (Implementation Extras)
|
| 547 |
+
|
| 548 |
+
| Feature | Implementation | Spec Coverage | Value Add |
|
| 549 |
+
|---------|----------------|----------------|-----------|
|
| 550 |
+
| Custom UI Styling | Implemented | Not specified | ⭐ Better UX |
|
| 551 |
+
| Intelligent Two-Stage Chunking | Implemented | Not specified | ⭐ Better RAG accuracy |
|
| 552 |
+
| GitHub URL Rewriting | Implemented | Implicit in FR-004 | ⭐ Source attribution |
|
| 553 |
+
| Optional Loki Integration | Implemented | Implicit in observability | ⭐ Production-ready |
|
| 554 |
+
| Markdown-specific Processing | Implemented | Not specified | ⭐ Format-aware |
|
| 555 |
+
| Unicode Normalization | Implemented | FR-008 | ⭐ Cross-platform consistency |
|
| 556 |
+
| Rate Limit Detection | Implemented | Implicit in FR-012 | ⭐ Better error recovery |
|
| 557 |
+
|
| 558 |
+
---
|
| 559 |
+
|
| 560 |
+
|
| 561 |
+
## 19. Recommendations
|
| 562 |
+
|
| 563 |
+
### High Priority (Missing Success Metrics)
|
| 564 |
+
1. **Implement measurement framework** for SC-001 through SC-008
|
| 565 |
+
2. **Add accuracy tests** to validate factual sourcing (SC-002)
|
| 566 |
+
3. **Create latency monitoring** for response time targets (SC-005)
|
| 567 |
+
|
| 568 |
+
### Medium Priority (Improve Error Handling & Logging)
|
| 569 |
+
1. **Enhance tool failure messages** with user-friendly text (FR-012)
|
| 570 |
+
2. **Add conflict logging system** for documentation contradictions (FR-011)
|
| 571 |
+
3. **Implement retry logic** for mandatory tool failures
|
| 572 |
+
4. **Validate source attribution** in responses via testing (FR-004)
|
| 573 |
+
|
| 574 |
+
### Low Priority (Optional Features)
|
| 575 |
+
1. **Implement LinkedIn tool** integration (optional per spec)
|
| 576 |
+
2. **Add telemetry dashboard** for monitoring
|
| 577 |
+
3. **Create admin UI** for runtime document configuration
|
| 578 |
+
|
| 579 |
+
### Documentation Updates
|
| 580 |
+
1. **Document tool activation**: Make explicit which tools are included in each agent instance
|
| 581 |
+
2. **Document conflict resolution**: How conflicts are detected and logged
|
| 582 |
+
3. **Document error handling**: User-facing error messages and recovery strategies
|
| 583 |
+
|
| 584 |
+
---
|
| 585 |
+
|
| 586 |
+
## 20. Conclusion
|
| 587 |
+
|
| 588 |
+
**Compliance Score**: ~85%
|
| 589 |
+
|
| 590 |
+
**Working Features** (100% spec compliance):
|
| 591 |
+
- ✅ Chat interface
|
| 592 |
+
- ✅ Knowledge base loading & RAG
|
| 593 |
+
- ✅ Session isolation
|
| 594 |
+
- ✅ First-person persona
|
| 595 |
+
- ✅ Time & Memory tools
|
| 596 |
+
- ✅ GitHub tool (conditional)
|
| 597 |
+
- ✅ Output normalization
|
| 598 |
+
- ✅ Conversation history
|
| 599 |
+
|
| 600 |
+
**Partial Features** (50% spec compliance):
|
| 601 |
+
- ⚠️ Source attribution (works, not measured)
|
| 602 |
+
- ⚠️ Tool failure handling (basic, needs user-friendly messages)
|
| 603 |
+
- ⚠️ Conflict resolution (no formal logging)
|
| 604 |
+
|
| 605 |
+
**Missing Features**:
|
| 606 |
+
- ❌ LinkedIn tool (optional)
|
| 607 |
+
- ❌ Success metrics framework (high value)
|
| 608 |
+
|
| 609 |
+
**Assessment**: The existing implementation is production-quality for core functionality. The main gaps are:
|
| 610 |
+
1. **Measurement/Observability**: No framework for tracking success criteria
|
| 611 |
+
2. **Error UX**: Tool failures need better user-facing messages
|
| 612 |
+
3. **Formal Conflict Logging**: Conflicts detected but not systematically logged
|
| 613 |
+
4. **LinkedIn Integration**: Optional but unimplemented
|
| 614 |
+
|
| 615 |
+
**Recommendation**:
|
| 616 |
+
- Use this specification as the authoritative source for required features
|
| 617 |
+
- Prioritize implementing success metrics measurement framework
|
| 618 |
+
- Enhance error handling with user-friendly messages
|
| 619 |
+
- Add formal conflict logging
|
| 620 |
+
- Document current tool integration approach
|
| 621 |
+
|
| 622 |
+
The implementation demonstrates solid engineering practices and exceeds the spec in many areas (async patterns, structured logging, type safety). The specification helps formalize requirements and measurement criteria.
|
| 623 |
+
|
specs/001-personified-ai-agent/plan.md
ADDED
|
@@ -0,0 +1,349 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan: Personified AI Agent
|
| 2 |
+
|
| 3 |
+
**Branch**: `001-personified-ai-agent` | **Date**: 2025-10-23 | **Spec**: [spec.md](spec.md)
|
| 4 |
+
**Gap Analysis**: [gap-analysis.md](gap-analysis.md)
|
| 5 |
+
**Input**: Feature specification from `specs/001-personified-ai-agent/spec.md`
|
| 6 |
+
|
| 7 |
+
## Summary
|
| 8 |
+
|
| 9 |
+
The Personified AI Agent is a production-ready implementation (~85% spec compliant) that enables users to interact with an AI agent representing a real person's knowledge, experience, and philosophies. The agent retrieves information from a public GitHub repository of markdown documentation and responds in first-person perspective with accurate, sourced information.
|
| 10 |
+
|
| 11 |
+
**Current Implementation Status**: Core features working; enhancements needed for success metrics framework, LinkedIn integration (optional), and improved error handling.
|
| 12 |
+
|
| 13 |
+
**Recommended Action**: This specification formalizes existing implementation. Use it to:
|
| 14 |
+
1. Close gaps in success metrics measurement
|
| 15 |
+
2. Improve error handling and user experience
|
| 16 |
+
3. Add optional LinkedIn tool integration
|
| 17 |
+
4. Document and formalize conflict resolution logging
|
| 18 |
+
|
| 19 |
+
## Technical Context
|
| 20 |
+
|
| 21 |
+
**Language/Version**: Python 3.12+ (via `uv` package manager)
|
| 22 |
+
**Primary Dependencies**:
|
| 23 |
+
- OpenAI Agents SDK (agent orchestration, async patterns)
|
| 24 |
+
- LangChain (document loading, chunking, embeddings)
|
| 25 |
+
- ChromaDB (ephemeral vector storage)
|
| 26 |
+
- Gradio (chat UI framework)
|
| 27 |
+
- Pydantic (type-safe configuration)
|
| 28 |
+
|
| 29 |
+
**Storage**:
|
| 30 |
+
- ChromaDB (in-memory, ephemeral - rebuilt on restart)
|
| 31 |
+
- Session-scoped memory files (MCP memory server)
|
| 32 |
+
- GitHub repository (markdown documentation source)
|
| 33 |
+
|
| 34 |
+
**Testing**: pytest-asyncio (function-scoped fixtures, async testing)
|
| 35 |
+
**Target Platform**: Hugging Face Spaces (Gradio deployment), Linux/Docker
|
| 36 |
+
**Project Type**: Web application (single project with backend API + frontend chat UI)
|
| 37 |
+
**Performance Goals**:
|
| 38 |
+
- <5 seconds response time (SC-005)
|
| 39 |
+
- 10+ concurrent users independently (SC-006)
|
| 40 |
+
- <100ms per vector search query
|
| 41 |
+
|
| 42 |
+
**Constraints**:
|
| 43 |
+
- No hardcoded knowledge (RAG-first)
|
| 44 |
+
- No shared mutable state (per-session isolation)
|
| 45 |
+
- No blocking operations (async throughout)
|
| 46 |
+
- Temperature=1.0 default (natural responses); 0.0 for tests (determinism)
|
| 47 |
+
|
| 48 |
+
**Scale/Scope**: Single agent per deployment, supports unlimited concurrent sessions (Gradio session_hash keying)
|
| 49 |
+
|
| 50 |
+
## Constitution Check
|
| 51 |
+
|
| 52 |
+
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
| 53 |
+
|
| 54 |
+
| Principle | Current Status | Compliance | Notes |
|
| 55 |
+
|-----------|----------------|-----------|-------|
|
| 56 |
+
| **I. Async-First** | ✅ Compliant | 100% | All I/O async; no blocking in hot path |
|
| 57 |
+
| **II. RAG-First** | ✅ Compliant | 100% | All responses from retrieved documents; no hardcoded knowledge |
|
| 58 |
+
| **III. Type-Safe Config** | ✅ Compliant | 100% | Pydantic BaseSettings; SecretStr for sensitive values |
|
| 59 |
+
| **IV. Session Isolation** | ✅ Compliant | 100% | Per-session agents; MCP servers isolated per session |
|
| 60 |
+
| **V. Test-First Development** | ⚠️ Partial | 70% | Working integration tests; missing success metrics validation |
|
| 61 |
+
| **VI. Strict Import Organization** | ✅ Compliant | 100% | PEP 8 followed; imports organized properly |
|
| 62 |
+
| **VII. Observability & Logging** | ✅ Compliant | 100% | Structured logging; session context on all logs; optional Loki |
|
| 63 |
+
| **VIII. Persona Consistency** | ✅ Compliant | 100% | First-person perspective; employer transparency explicit |
|
| 64 |
+
| **IX. Output Cleanliness** | ✅ Compliant | 100% | Unicode normalization applied to all responses |
|
| 65 |
+
|
| 66 |
+
**Gate Status**: ✅ **PASS** - All core principles met. V (testing) partially complete; gaps are measurement, not functionality.
|
| 67 |
+
|
| 68 |
+
## Project Structure
|
| 69 |
+
|
| 70 |
+
### Documentation (this feature)
|
| 71 |
+
|
| 72 |
+
```text
|
| 73 |
+
specs/001-personified-ai-agent/
|
| 74 |
+
├── spec.md # Feature specification (finalized)
|
| 75 |
+
├── plan.md # This file (implementation strategy)
|
| 76 |
+
├── gap-analysis.md # Gap analysis: spec vs implementation
|
| 77 |
+
├── research.md # [TODO] Phase 0: Research findings
|
| 78 |
+
├── data-model.md # [TODO] Phase 1: Data model & entities
|
| 79 |
+
├── quickstart.md # [TODO] Phase 1: Developer quickstart
|
| 80 |
+
├── contracts/ # [TODO] Phase 1: API contracts (OpenAPI)
|
| 81 |
+
│ └── agent-api.yaml
|
| 82 |
+
├── checklists/
|
| 83 |
+
│ └── requirements.md # Quality checklist (finalized)
|
| 84 |
+
└── tasks.md # [TODO] Phase 2: Task breakdown
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Source Code (repository root)
|
| 88 |
+
|
| 89 |
+
```text
|
| 90 |
+
ai-me/
|
| 91 |
+
├── src/
|
| 92 |
+
│ ├── __init__.py
|
| 93 |
+
│ ├── agent.py # ✅ AIMeAgent class, MCP setup, RAG tool
|
| 94 |
+
│ ├── app.py # ✅ Gradio UI, session management
|
| 95 |
+
│ ├── data.py # ✅ Document loading, chunking, vectorstore
|
| 96 |
+
│ ├── config.py # ✅ Pydantic configuration
|
| 97 |
+
│ ├── test.py # ✅ Integration tests
|
| 98 |
+
│ ├── notebooks/
|
| 99 |
+
│ │ └── experiments.ipynb # Development sandbox
|
| 100 |
+
│ └── static/
|
| 101 |
+
│ ├── style.css # Custom Gradio styling
|
| 102 |
+
│ └── scroll.js # UI scroll behavior
|
| 103 |
+
│
|
| 104 |
+
├── docs/ # ✅ Local markdown documentation (RAG source)
|
| 105 |
+
│ └── *.md # Dynamically loaded based on config
|
| 106 |
+
│
|
| 107 |
+
├── test_data/ # ✅ Test fixtures
|
| 108 |
+
│ ├── projects.md
|
| 109 |
+
│ ├── team.md
|
| 110 |
+
│ └── README.md
|
| 111 |
+
│
|
| 112 |
+
├── .specify/ # ✅ Spec Kit configuration
|
| 113 |
+
│ ├── memory/
|
| 114 |
+
│ │ └── constitution.md # Project principles
|
| 115 |
+
│ └── scripts/ # Spec Kit scripts
|
| 116 |
+
│
|
| 117 |
+
├── .github/
|
| 118 |
+
│ ├── copilot-instructions.md # AI assistant guidance
|
| 119 |
+
│ └── prompts/ # Spec Kit prompts
|
| 120 |
+
│
|
| 121 |
+
├── pyproject.toml # ✅ Project config (uv)
|
| 122 |
+
├── Dockerfile # ✅ Docker build
|
| 123 |
+
├── docker-compose.yaml # ✅ Local development
|
| 124 |
+
├── README.md # ✅ Project overview
|
| 125 |
+
├── TESTING.md # ✅ Test setup guide
|
| 126 |
+
└── RETROFIT_COMPLETE.md # ✅ Retrofit documentation
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
**Structure Decision**: Single project with backend (Python) + frontend (Gradio). All code in `src/` with clear separation:
|
| 130 |
+
- `agent.py`: Agent orchestration and MCP server setup
|
| 131 |
+
- `app.py`: Chat UI and session lifecycle
|
| 132 |
+
- `data.py`: Document pipeline (load, chunk, embed, store)
|
| 133 |
+
- `config.py`: Type-safe configuration and logging
|
| 134 |
+
- `test.py`: Integration testing with async fixtures
|
| 135 |
+
|
| 136 |
+
## Gap Analysis Summary
|
| 137 |
+
|
| 138 |
+
From [gap-analysis.md](gap-analysis.md):
|
| 139 |
+
|
| 140 |
+
**Compliance**: ~85% ✅
|
| 141 |
+
|
| 142 |
+
**Fully Implemented**:
|
| 143 |
+
- ✅ Chat interface (FR-001)
|
| 144 |
+
- ✅ Knowledge base retrieval (FR-002, with admin-configurable GitHub repos)
|
| 145 |
+
- ✅ First-person persona (FR-003)
|
| 146 |
+
- ✅ Conversation history (FR-005)
|
| 147 |
+
- ✅ Knowledge gap handling (FR-006)
|
| 148 |
+
- ✅ Session isolation (FR-007)
|
| 149 |
+
- ✅ Output normalization (FR-008)
|
| 150 |
+
- ✅ Time & Memory tools (FR-009)
|
| 151 |
+
- ✅ GitHub tool conditional activation (FR-010)
|
| 152 |
+
|
| 153 |
+
**Partially Implemented** (working but needs enhancement):
|
| 154 |
+
- ⚠️ Source attribution (FR-004): Works but not measured
|
| 155 |
+
- ⚠️ Conflict logging (FR-011): Partial (no formal log)
|
| 156 |
+
- ⚠️ Tool failure handling (FR-012): Basic error handling, needs user-friendly messages
|
| 157 |
+
- ⚠️ Success metrics framework (SC-001-SC-008): we store user/agent interactions plus logging in Loki, but need queries to create a dashboard.
|
| 158 |
+
|
| 159 |
+
**Not Implemented**:
|
| 160 |
+
- ❌ LinkedIn tool (FR-010 optional): No implementation
|
| 161 |
+
|
| 162 |
+
## Phase 0: Research (TODO)
|
| 163 |
+
|
| 164 |
+
**Purpose**: Resolve technical unknowns identified in gap analysis.
|
| 165 |
+
|
| 166 |
+
**Research Tasks**:
|
| 167 |
+
1. **LinkedIn Tool Integration** (FR-010-LinkedIn)
|
| 168 |
+
- Investigate LinkedIn MCP server availability
|
| 169 |
+
- Document authentication requirements (API token, OAuth)
|
| 170 |
+
- Estimate implementation effort
|
| 171 |
+
|
| 172 |
+
2. **Success Metrics Framework** (SC-001-SC-008)
|
| 173 |
+
- Telemetry collection approach (logs, metrics, events)
|
| 174 |
+
- Survey infrastructure for user perception (SC-001, SC-008)
|
| 175 |
+
- Load testing framework for concurrency (SC-006)
|
| 176 |
+
- Accuracy validation test suite design (SC-002)
|
| 177 |
+
|
| 178 |
+
3. **Conflict Detection & Logging** (FR-011)
|
| 179 |
+
- Best practices for contradiction detection
|
| 180 |
+
- Log schema design for conflict tracking
|
| 181 |
+
- Human review workflow
|
| 182 |
+
|
| 183 |
+
**Deliverable**: `research.md` with decisions and rationale
|
| 184 |
+
|
| 185 |
+
## Phase 1: Design & Contracts (TODO)
|
| 186 |
+
|
| 187 |
+
**Purpose**: Formalize data models, APIs, and implementation contracts.
|
| 188 |
+
|
| 189 |
+
**Design Tasks**:
|
| 190 |
+
|
| 191 |
+
1. **Data Model** (`data-model.md`):
|
| 192 |
+
- Formalize Key Entities from spec (PersonProfile, ConversationSession, Message, etc.)
|
| 193 |
+
- Define relationships and validation rules
|
| 194 |
+
- Model conflict tracking and user attributes
|
| 195 |
+
|
| 196 |
+
2. **API Contracts** (`contracts/`):
|
| 197 |
+
- OpenAPI schema for agent endpoints (existing: chat, warmup, status)
|
| 198 |
+
- Request/response models for tool calls
|
| 199 |
+
- Error response formats
|
| 200 |
+
|
| 201 |
+
3. **Developer Quickstart** (`quickstart.md`):
|
| 202 |
+
- How to run locally
|
| 203 |
+
- How to configure documents (GitHub repos, local files)
|
| 204 |
+
- How to configure tools (GitHub PAT, LinkedIn token)
|
| 205 |
+
- How to run tests and measure success
|
| 206 |
+
|
| 207 |
+
4. **Agent Context Update** (Copilot context):
|
| 208 |
+
- Run update-agent-context.sh to inject new technology decisions
|
| 209 |
+
- Update context for future AI-assisted development
|
| 210 |
+
|
| 211 |
+
**Deliverables**: data-model.md, contracts/agent-api.yaml, quickstart.md
|
| 212 |
+
|
| 213 |
+
## Phase 2: Task Breakdown (TODO)
|
| 214 |
+
|
| 215 |
+
**Purpose**: Create actionable task list from design.
|
| 216 |
+
|
| 217 |
+
**Generated by**: `/speckit.tasks` command
|
| 218 |
+
|
| 219 |
+
**Expected Output**: `tasks.md` with prioritized, estimated tasks:
|
| 220 |
+
- Priority groupings (P0: blocking, P1: core, P2: nice-to-have)
|
| 221 |
+
- Effort estimates (t-shirt sizing: xs/s/m/l/xl)
|
| 222 |
+
- Dependencies between tasks
|
| 223 |
+
- Test criteria for each task
|
| 224 |
+
|
| 225 |
+
## Enhancement Roadmap
|
| 226 |
+
|
| 227 |
+
Based on gap analysis, recommended implementation order:
|
| 228 |
+
|
| 229 |
+
### Phase A: Measurement & Validation (High Priority)
|
| 230 |
+
1. **Success Metrics Framework**
|
| 231 |
+
- ✅ Loki infrastructure ready (framework complete)
|
| 232 |
+
- **TODO**: Create Loki queries for each SC criterion (SC-001 through SC-008)
|
| 233 |
+
- Create dashboards to visualize metric trends
|
| 234 |
+
- Design user survey for persona consistency and personalization (SC-001, SC-008)
|
| 235 |
+
|
| 236 |
+
2. **Improve Error Handling**
|
| 237 |
+
- Add user-friendly error messages for tool failures (FR-012)
|
| 238 |
+
- Implement retry logic for mandatory tools with exponential backoff
|
| 239 |
+
- Test error paths with quality assertions
|
| 240 |
+
|
| 241 |
+
3. **Formalize Conflict Logging**
|
| 242 |
+
- Detect conflicting document chunks (FR-011)
|
| 243 |
+
- Create ConflictLog entity for recording
|
| 244 |
+
- Structured JSON logging with session context
|
| 245 |
+
- Dashboard or report view for human review
|
| 246 |
+
|
| 247 |
+
### Phase B: Optional Features (Medium Priority)
|
| 248 |
+
1. **LinkedIn Tool Integration** (if high value)
|
| 249 |
+
- Research LinkedIn API or MCP server
|
| 250 |
+
- Add conditional activation on token presence
|
| 251 |
+
- Test with LinkedIn API keys
|
| 252 |
+
|
| 253 |
+
2. **Source Attribution Validation**
|
| 254 |
+
- Add test assertions to verify attribution in responses
|
| 255 |
+
- Create quality report on source documentation
|
| 256 |
+
|
| 257 |
+
### Phase C: Polish (Low Priority)
|
| 258 |
+
1. Admin UI for runtime document configuration
|
| 259 |
+
2. Telemetry dashboard for monitoring
|
| 260 |
+
3. User-facing feedback for success metric collection
|
| 261 |
+
|
| 262 |
+
---
|
| 263 |
+
|
| 264 |
+
## Complexity Tracking
|
| 265 |
+
|
| 266 |
+
> **Constitution Check passed without violations** - All core principles met
|
| 267 |
+
|
| 268 |
+
No complexity justifications needed. Implementation cleanly follows spec and constitution without additional complexity beyond original design.
|
| 269 |
+
|
| 270 |
+
---
|
| 271 |
+
|
| 272 |
+
## Key Implementation Notes
|
| 273 |
+
|
| 274 |
+
### 1. Document Configuration (FR-002 Clarification #1)
|
| 275 |
+
- Currently: Hardcoded local `docs/` directory + `config.github_repos` environment variable
|
| 276 |
+
- Future: Consider admin UI or API for runtime configuration
|
| 277 |
+
- Current approach sufficient and flexible for per-deployment customization
|
| 278 |
+
|
| 279 |
+
### 2. Tool Activation (FR-009-010 Clarification #2)
|
| 280 |
+
- **Always-on**: Time (no config), Memory (per-session, no config)
|
| 281 |
+
- **Conditional**: GitHub (if `GITHUB_PERSONAL_ACCESS_TOKEN` set), LinkedIn (if token set)
|
| 282 |
+
- **Status**: Implemented; LinkedIn tool not yet added
|
| 283 |
+
|
| 284 |
+
### 3. Conflict Resolution (FR-011 Clarification #3)
|
| 285 |
+
- Strategy: Vector search score prioritization
|
| 286 |
+
- Missing: Formal detection and logging
|
| 287 |
+
- Recommended: Add ConflictLog entity and structured logging
|
| 288 |
+
|
| 289 |
+
### 4. Tool Failure Handling (FR-012 Clarification #4)
|
| 290 |
+
- Strategy: Return user-friendly error; halt until recovery
|
| 291 |
+
- Current: Basic error handling exists
|
| 292 |
+
- Enhancement: Add retry logic for mandatory tools
|
| 293 |
+
|
| 294 |
+
### 5. Memory Scope (FR-013 Clarification #5)
|
| 295 |
+
- Strategy: Session-scoped user attributes (name, profession, interests, hobbies)
|
| 296 |
+
- Status: ✅ Implemented via MCP memory server with session-based file paths
|
| 297 |
+
- Privacy: Resets between sessions automatically
|
| 298 |
+
|
| 299 |
+
---
|
| 300 |
+
|
| 301 |
+
## Testing Strategy
|
| 302 |
+
|
| 303 |
+
**Existing** (`src/test.py`):
|
| 304 |
+
- Function-scoped fixtures (fresh agent per test)
|
| 305 |
+
- Temperature 0.0 for determinism
|
| 306 |
+
- Tests: RAG knowledge, MCP integration, error handling, quality
|
| 307 |
+
- Full integration (includes actual tool calls)
|
| 308 |
+
|
| 309 |
+
**Recommended Additions**:
|
| 310 |
+
- SC-002 accuracy validation (verify 100% factual sourcing)
|
| 311 |
+
- SC-005 latency monitoring (track response time)
|
| 312 |
+
- SC-006 load testing (10+ concurrent sessions)
|
| 313 |
+
- Tool failure error message quality
|
| 314 |
+
- Source attribution presence in responses
|
| 315 |
+
- Conflict detection and logging
|
| 316 |
+
|
| 317 |
+
---
|
| 318 |
+
|
| 319 |
+
## Success Criteria Alignment
|
| 320 |
+
|
| 321 |
+
| Spec | Current | Gap | Phase |
|
| 322 |
+
|------|---------|-----|-------|
|
| 323 |
+
| SC-001: 80% persona consistency (survey) | Loki framework ready | Create Loki query + survey | Phase A |
|
| 324 |
+
| SC-002: 100% factual accuracy | Loki framework ready | Create Loki query + validation test | Phase A |
|
| 325 |
+
| SC-003: 90% in-scope answers | Loki framework ready | Create Loki query + telemetry | Phase A |
|
| 326 |
+
| SC-004: 100% knowledge gap indication | Working + Loki ready | Create Loki query + verify | Phase A |
|
| 327 |
+
| SC-005: <5s response time | Loki framework ready | Create Loki query for latency analysis | Phase A |
|
| 328 |
+
| SC-006: 10+ concurrent users | Loki framework ready | Create Loki query for session count | Phase A |
|
| 329 |
+
| SC-007: Graceful tool error handling | Working + Loki ready | Create Loki query for error patterns | Phase A |
|
| 330 |
+
| SC-008: Memory personalization | Works + Loki ready | Create Loki query + survey | Phase A |
|
| 331 |
+
|
| 332 |
+
**Metrics Framework Status**: ✅ **Loki infrastructure complete**
|
| 333 |
+
All logs are captured with session context. **Gap**: Need specific Loki queries for each SC criterion to enable metric dashboards.
|
| 334 |
+
|
| 335 |
+
---
|
| 336 |
+
|
| 337 |
+
## Recommended Next Steps
|
| 338 |
+
|
| 339 |
+
1. **Review this plan** with project stakeholders
|
| 340 |
+
2. **Run Phase 0 research** on LinkedIn integration and success metrics
|
| 341 |
+
3. **Create Phase 1 design artifacts** (data-model.md, contracts, quickstart)
|
| 342 |
+
4. **Run `/speckit.tasks`** to break Phase A into concrete tasks
|
| 343 |
+
5. **Execute Phase A tasks** to close measurement and error handling gaps
|
| 344 |
+
6. **Validate success criteria** with real users and telemetry
|
| 345 |
+
|
| 346 |
+
---
|
| 347 |
+
|
| 348 |
+
**Plan Status**: ✅ Ready for Phase 0 Research
|
| 349 |
+
**Next Command**: `/speckit.tasks` (after Phase 0-1 complete) or `/speckit.implement` (if proceeding directly)
|
specs/001-personified-ai-agent/spec.md
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Feature Specification: Personified AI Agent
|
| 2 |
+
|
| 3 |
+
**Feature Branch**: `001-personified-ai-agent`
|
| 4 |
+
**Created**: 2025-10-23
|
| 5 |
+
**Status**: Draft
|
| 6 |
+
**Input**: User description: "An AI Agent that represents a real persons knowledge, experience, and philosophies. Users can interact with the agent in a chat interface that responds with information that is applicable to the person the agent is personifying."
|
| 7 |
+
|
| 8 |
+
## Clarifications
|
| 9 |
+
|
| 10 |
+
### Session 2025-10-23
|
| 11 |
+
|
| 12 |
+
- Q: How should knowledge base documents be configured? → A: Flexible, admin-configurable markdown files stored in a public GitHub repository
|
| 13 |
+
- Q: Are external tools (GitHub, LinkedIn, Memory, Time) mandatory or optional? → A: Time and Memory are mandatory/always-on. GitHub and LinkedIn activate conditionally based on environment credentials (GitHub PAT, LinkedIn API token)
|
| 14 |
+
- Q: How should conflicting documentation be handled? → A: Prioritize by vector search relevance score; log conflicts for human review post-session
|
| 15 |
+
- Q: What happens when external tools fail? → A: Return user-friendly error messages; no partial answers until tools recover
|
| 16 |
+
- Q: What should the Memory tool remember and for how long? → A: Session-scoped only; tracks user attributes (name, profession, interests, hobbies) to personalize responses; resets between sessions
|
| 17 |
+
|
| 18 |
+
## User Scenarios & Testing *(mandatory)*
|
| 19 |
+
|
| 20 |
+
### User Story 1 - Chat with Personified Agent About Expertise (Priority: P1)
|
| 21 |
+
|
| 22 |
+
A user opens the chat interface and asks the personified AI agent a question about the person's professional knowledge, projects, or experience. The agent responds with accurate information that sounds like it comes from the person themselves, using first-person perspective and maintaining the person's authentic voice and philosophies.
|
| 23 |
+
|
| 24 |
+
**Why this priority**: This is the core value proposition—users must be able to have conversations with an agent that authentically represents a real person's expertise. Without this, the application has no purpose.
|
| 25 |
+
|
| 26 |
+
**Independent Test**: Can be fully tested by opening the chat interface, asking a question about the person's expertise, and verifying the response is accurate, uses first-person perspective, and reflects the person's knowledge.
|
| 27 |
+
|
| 28 |
+
**Acceptance Scenarios**:
|
| 29 |
+
|
| 30 |
+
1. **Given** a user is on the chat interface, **When** they ask "What is your experience with [relevant topic]?", **Then** the agent responds with factual information about the person's background in first-person perspective
|
| 31 |
+
2. **Given** the agent has access to documentation about the person's work, **When** a user asks about a project, **Then** the agent retrieves and summarizes the relevant project details accurately
|
| 32 |
+
3. **Given** a user asks about the person's philosophy or approach, **When** the agent responds, **Then** the response reflects the documented philosophies and maintains authentic voice
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
### User Story 2 - Interact Across Multiple Conversation Topics (Priority: P2)
|
| 37 |
+
|
| 38 |
+
A user has multiple conversations with the agent across different topics (e.g., professional questions, personal philosophies, project specifics). Each conversation maintains context about the person's identity and answers are consistent across topics.
|
| 39 |
+
|
| 40 |
+
**Why this priority**: Users need to be able to explore different aspects of the person's knowledge in a single session without losing the sense that they're talking to one consistent person. This enables deeper engagement.
|
| 41 |
+
|
| 42 |
+
**Independent Test**: Can be fully tested by starting a conversation, asking questions on different topics, and verifying the agent maintains persona consistency and provides topic-appropriate, contextually accurate responses.
|
| 43 |
+
|
| 44 |
+
**Acceptance Scenarios**:
|
| 45 |
+
|
| 46 |
+
1. **Given** a user asks about multiple different topics, **When** the agent responds to each question, **Then** all responses use consistent first-person perspective and reflect the same person's identity
|
| 47 |
+
2. **Given** a user asks follow-up questions, **When** the agent responds, **Then** it maintains awareness of previous messages in the conversation
|
| 48 |
+
3. **Given** a user asks questions outside the documented knowledge, **When** the agent responds, **Then** it gracefully indicates gaps in its knowledge while staying in character
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
### User Story 3 - Access Sourced Information with Attribution (Priority: P2)
|
| 53 |
+
|
| 54 |
+
A user asks the agent a question, and the agent provides a response with clear references to where the information came from (e.g., "As mentioned in my project documentation..." or "Per my resume..."). Users can understand the credibility and source of the agent's responses.
|
| 55 |
+
|
| 56 |
+
**Why this priority**: Transparency about information sources builds trust. Users need to know whether the agent is drawing from documented facts versus making inferences.
|
| 57 |
+
|
| 58 |
+
**Independent Test**: Can be fully tested by asking questions that should reference documented sources and verifying the agent provides source attribution for factual claims.
|
| 59 |
+
|
| 60 |
+
**Acceptance Scenarios**:
|
| 61 |
+
|
| 62 |
+
1. **Given** a user asks about the person's background, **When** the agent responds, **Then** it references specific documents or sections where this information comes from
|
| 63 |
+
2. **Given** the agent uses information from multiple sources, **When** it responds, **Then** it appropriately attributes different points to their sources
|
| 64 |
+
3. **Given** a user asks for clarification on a source, **When** they request it, **Then** the agent can identify which documentation supports its answer
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
### Edge Cases
|
| 69 |
+
|
| 70 |
+
- What happens when a user asks a question that cannot be answered from the documented knowledge base?
|
| 71 |
+
- How does the system handle questions that might misrepresent the person's views (e.g., "You must believe X...")?
|
| 72 |
+
- What happens if the person's documented views seem to contradict each other on a topic?
|
| 73 |
+
- How does the agent handle requests for information about the person that isn't documented (e.g., personal details)?
|
| 74 |
+
- What occurs if the documentation about the person is incomplete for a specific topic the user asks about?
|
| 75 |
+
|
| 76 |
+
### Tool Integration & Failure Handling
|
| 77 |
+
|
| 78 |
+
- When Time or Memory tools become unavailable, system returns user-friendly error and halts processing until tool recovers
|
| 79 |
+
- When optional tools (GitHub, LinkedIn) are unavailable, system gracefully degrades—they remain inactive until credentials are provided
|
| 80 |
+
- When documentation conflicts exist, system logs the conflict (including vector search scores) for human review; agent uses highest-scoring source in response
|
| 81 |
+
- When GitHub repository access fails (invalid PAT, rate limit, network error), system returns friendly error message; processing resumes when connectivity/credentials restored
|
| 82 |
+
- When Memory tool detects new user attributes not previously tracked in session, it persists them for duration of session only
|
| 83 |
+
|
| 84 |
+
## Requirements *(mandatory)*
|
| 85 |
+
|
| 86 |
+
### Functional Requirements
|
| 87 |
+
|
| 88 |
+
- **FR-001**: System MUST provide a chat interface where users can send messages and receive responses
|
| 89 |
+
- **FR-002**: System MUST retrieve relevant information from the person's knowledge base (admin-configurable markdown files in a public GitHub repository) based on user queries
|
| 90 |
+
- **FR-003**: System MUST respond in first-person perspective, maintaining the persona of the person being represented
|
| 91 |
+
- **FR-004**: System MUST reference sources for factual claims (e.g., "per my documentation on X")
|
| 92 |
+
- **FR-005**: System MUST maintain conversation history within a single session
|
| 93 |
+
- **FR-006**: System MUST handle cases where the knowledge base doesn't contain an answer by gracefully indicating knowledge gaps
|
| 94 |
+
- **FR-007**: System MUST support conversation threads/sessions isolated from other users
|
| 95 |
+
- **FR-008**: System MUST normalize and clean output to ensure consistent, readable responses across platforms
|
| 96 |
+
- **FR-009**: System MUST include mandatory tools: Time (current date/time) and Memory (session-scoped user attribute tracking)
|
| 97 |
+
- **FR-010**: System MUST support optional tools: GitHub (activated if GitHub PAT environment variable set) and LinkedIn (activated if LinkedIn API token environment variable set)
|
| 98 |
+
- **FR-011**: System MUST prioritize conflicting documentation by vector search relevance score and log conflicts for human review
|
| 99 |
+
- **FR-012**: When external tools fail or become unavailable, system MUST return a user-friendly error message and wait for tool recovery before processing further queries
|
| 100 |
+
- **FR-013**: Memory tool MUST track session-scoped user attributes (name, profession, interests, hobbies) to personalize responses; memory resets between sessions
|
| 101 |
+
|
| 102 |
+
### Key Entities
|
| 103 |
+
|
| 104 |
+
- **PersonProfile**: Represents the person the agent embodies (name, background, documented knowledge, philosophies, areas of expertise)
|
| 105 |
+
- **ConversationSession**: Represents an individual user's conversation instance with metadata (session ID, timestamps, message history, session-scoped memory)
|
| 106 |
+
- **Message**: User input or agent response within a conversation (content, timestamp, role, source attribution if applicable)
|
| 107 |
+
- **KnowledgeBase**: Collection of admin-configured markdown documentation about the person, sourced from a public GitHub repository (projects, resume, philosophies, experience, etc.)
|
| 108 |
+
- **RetrievedDocument**: Individual chunks of documentation retrieved during query processing (content, source, relevance score)
|
| 109 |
+
- **ToolConfiguration**: Defines which tools are active for an agent instance (Time: always on, Memory: always on, GitHub: conditional on GitHub PAT, LinkedIn: conditional on LinkedIn API token)
|
| 110 |
+
- **UserAttributes**: Session-scoped data tracked by Memory tool (name, profession, interests, hobbies) for personalizing responses within a session
|
| 111 |
+
- **ConflictLog**: Records instances where documentation contradicts itself, prioritized by vector search score, flagged for human review
|
| 112 |
+
|
| 113 |
+
## Success Criteria *(mandatory)*
|
| 114 |
+
|
| 115 |
+
### Measurable Outcomes
|
| 116 |
+
|
| 117 |
+
- **SC-001**: Users perceive responses as authentically representing the person's voice and perspective (test cases + selective sampling of logs show less than 10% false +/- rate)
|
| 118 |
+
- **SC-002**: Responses are factually accurate and sourced from the person's documentation (100% of sample responses contain only information present in knowledge base)
|
| 119 |
+
- **SC-003**: Users receive answers to their questions on topics covered in the documentation (90% of in-scope questions receive substantive responses)
|
| 120 |
+
- **SC-004**: System correctly handles knowledge gaps by indicating them to users (100% of out-of-scope questions receive explicit "I don't have documentation on that" acknowledgment)
|
| 121 |
+
- **SC-005**: Conversation can be completed in natural timeframe with responsive interaction (agent responds to user queries in under 5 seconds)
|
| 122 |
+
- **SC-006**: Multiple simultaneous users can interact with the agent without interference (10+ concurrent conversations function independently)
|
| 123 |
+
- **SC-007**: Tool failures are handled gracefully with user-friendly error messages (100% of tool failures result in appropriate error messaging, not crashes)
|
| 124 |
+
- **SC-008**: Session-scoped memory improves personalization within a conversation (users report agent responses feel more tailored as conversation progresses)
|
| 125 |
+
|
| 126 |
+
### Assumptions
|
| 127 |
+
|
| 128 |
+
- The person's knowledge and philosophies are documented in markdown files stored in a public GitHub repository (admin-configurable per agent instance)
|
| 129 |
+
- Users are familiar with chat interfaces and understand they're interacting with an AI agent
|
| 130 |
+
- The documentation is reasonably comprehensive for the domains the agent will be questioned about
|
| 131 |
+
- Users accept some inconsistencies if documentation is incomplete or contradictory (system logs conflicts for human review)
|
| 132 |
+
- The agent should prioritize accuracy over engagement (never fabricate information to be more responsive)
|
| 133 |
+
- Time tool (current date/time) and Memory tool (session-scoped user attributes) are always available in the operating environment
|
| 134 |
+
- GitHub PAT and LinkedIn API tokens, if provided, will be available as environment variables; tools remain inactive without credentials
|
| 135 |
+
- Users are comfortable with session-scoped memory (attributes not persisted across separate sessions without explicit opt-in mechanism)
|
specs/001-personified-ai-agent/tasks.md
ADDED
|
@@ -0,0 +1,425 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Task Breakdown: Personified AI Agent
|
| 2 |
+
|
| 3 |
+
**Feature**: Personified AI Agent
|
| 4 |
+
**Branch**: `001-personified-ai-agent` | **Date**: 2025-10-23
|
| 5 |
+
**Based On**: [spec.md](spec.md) | [plan.md](plan.md) | [gap-analysis.md](gap-analysis.md)
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Executive Summary
|
| 10 |
+
|
| 11 |
+
This document breaks down the Personified AI Agent specification into actionable, independently testable tasks organized by user story priority.
|
| 12 |
+
|
| 13 |
+
**Total Tasks**: 28
|
| 14 |
+
**Phases**: 3 (Setup, Foundational, User Stories)
|
| 15 |
+
|
| 16 |
+
### MVP Scope (Recommended)
|
| 17 |
+
- **Phase 1**: Setup & infrastructure
|
| 18 |
+
- **Phase 2**: Foundational (document pipeline, agent core)
|
| 19 |
+
- **Phase 3**: User Story 1 (Core chat with expertise)
|
| 20 |
+
|
| 21 |
+
**Estimated MVP Timeline**: 1-2 weeks with full team
|
| 22 |
+
**Full Scope Timeline**: 3-4 weeks including optional features
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## Phase 1: Setup & Infrastructure
|
| 27 |
+
|
| 28 |
+
**Goal**: Establish project foundation, environment, and tooling.
|
| 29 |
+
**Timeline**: 1-2 days
|
| 30 |
+
**Acceptance**: All dependencies installed, test environment running
|
| 31 |
+
|
| 32 |
+
### Setup Tasks
|
| 33 |
+
|
| 34 |
+
- [x] T001 Verify Python 3.12+ environment and uv package manager installed
|
| 35 |
+
- [x] T002 Create `.env` file with required environment variables (OPENAI_API_KEY, GROQ_API_KEY, GITHUB_PERSONAL_ACCESS_TOKEN, BOT_FULL_NAME, etc.)
|
| 36 |
+
- [x] T003 Run `uv sync` to install project dependencies from pyproject.toml
|
| 37 |
+
- [x] T004 Verify pytest-asyncio testing framework is installed and functional
|
| 38 |
+
- [x] T005 Create test data fixtures directory with sample markdown files in `test_data/`
|
| 39 |
+
- [x] T006 [P] Configure logging (console + optional Loki) in `src/config.py`
|
| 40 |
+
- [x] T007 Verify git branch `001-personified-ai-agent` is active and ready
|
| 41 |
+
|
| 42 |
+
**Test Criteria**:
|
| 43 |
+
- ✅ `uv --version` shows valid version
|
| 44 |
+
- ✅ `.env` file exists with all required keys
|
| 45 |
+
- ✅ `uv run pytest src/test.py --collect-only` shows 7+ tests collected
|
| 46 |
+
- ✅ Logger output shows both console and (if configured) Loki handlers
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Phase 2: Foundational Components (Blocking Prerequisites)
|
| 51 |
+
|
| 52 |
+
**Goal**: Implement core infrastructure that all user stories depend on.
|
| 53 |
+
**Timeline**: 3-5 days
|
| 54 |
+
**Dependencies**: Phase 1 complete
|
| 55 |
+
|
| 56 |
+
### Document Pipeline (FR-002)
|
| 57 |
+
|
| 58 |
+
- [x] T008 [P] Create `DataManager` class in `src/data.py` to load markdown documents from local `docs/` directory
|
| 59 |
+
- [x] T009 [P] Implement intelligent two-stage document chunking: header-aware splitting + size-based fallback in `src/data.py`
|
| 60 |
+
- [x] T010 [P] Integrate HuggingFace sentence-transformers embeddings for document vectors in `src/data.py`
|
| 61 |
+
- [x] T011 [P] Create ephemeral ChromaDB vectorstore initialization in `src/data.py` with in-memory storage
|
| 62 |
+
- [x] T012 Implement GitHub repository loading in `src/data.py` using GitHub API (conditional on GITHUB_PERSONAL_ACCESS_TOKEN)
|
| 63 |
+
- [x] T013 Implement relative GitHub link rewriting (convert `/resume.md` → `https://github.com/owner/repo/blob/main/resume.md`) in `src/data.py`
|
| 64 |
+
- [x] T014 Create `process_documents()` function in `src/data.py` to orchestrate load → chunk → embed → store pipeline
|
| 65 |
+
- [x] T015 Add error handling for individual document failures in `src/data.py` (log and continue on load errors)
|
| 66 |
+
|
| 67 |
+
**Test Criteria**:
|
| 68 |
+
- ✅ `test_rear_knowledge_contains_it245()` passes (knowledge retrieval works)
|
| 69 |
+
- ✅ `test_github_relative_links_converted_to_absolute_urls()` passes (Test 7)
|
| 70 |
+
- ✅ Vector store has 5+ chunks for sample documents
|
| 71 |
+
- ✅ Embeddings dimensionality is 384 (sentence-transformers default)
|
| 72 |
+
|
| 73 |
+
### Configuration & Logging (FR-012, Constitution VII)
|
| 74 |
+
|
| 75 |
+
- [x] T016 Implement `Config` Pydantic BaseSettings in `src/config.py` with environment variable loading
|
| 76 |
+
- [x] T017 Add optional Grafana Loki integration to `src/config.py` for remote logging with session context
|
| 77 |
+
- [x] T018 [P] Create structured JSON logger in `src/config.py` that tags all logs with `session_id` and `application=ai-me`
|
| 78 |
+
- [x] T019 [P] Add Unicode normalization table for output cleanliness in `src/agent.py` (Constitution IX)
|
| 79 |
+
|
| 80 |
+
**Test Criteria**:
|
| 81 |
+
- ✅ Logger output contains `session_id` in all messages
|
| 82 |
+
- ✅ Loki handler configured when LOKI_URL + credentials present
|
| 83 |
+
- ✅ Unicode normalization converts special characters to ASCII
|
| 84 |
+
|
| 85 |
+
### MCP Server Setup (FR-009, FR-010)
|
| 86 |
+
|
| 87 |
+
- [x] T020 Implement `setup_mcp_servers()` method in `src/agent.py` to initialize Time and Memory tools
|
| 88 |
+
- [x] T021 Create `get_mcp_time_params()` function to return Time server parameters
|
| 89 |
+
- [x] T022 Create `get_mcp_memory_params(session_id)` function to return session-scoped Memory server parameters
|
| 90 |
+
- [x] T023 [P] Create `get_mcp_github_params()` function to return GitHub server parameters (conditional on GitHub PAT)
|
| 91 |
+
- [x] T024 Implement exception handling in `setup_mcp_servers()` for tool connection failures with session-scoped logging
|
| 92 |
+
- [x] T025 [P] Create error handler that returns user-friendly messages when MCP servers fail (FR-012)
|
| 93 |
+
|
| 94 |
+
**Test Criteria**:
|
| 95 |
+
- ✅ `test_tool_integration_github_mcp()` passes (GitHub tool works when PAT set)
|
| 96 |
+
- ✅ Time tool returns current date/time in responses
|
| 97 |
+
- ✅ Memory tool persists attributes within session (resets on new session)
|
| 98 |
+
- ✅ Tool failures return user-friendly error messages, not tracebacks
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## Phase 3: User Story 1 - Chat with Personified Agent About Expertise (Priority: P1)
|
| 103 |
+
|
| 104 |
+
**Goal**: Implement core chat functionality with authentic persona and RAG.
|
| 105 |
+
**Timeline**: 4-6 days
|
| 106 |
+
**Dependencies**: Phase 2 complete
|
| 107 |
+
**Independent Tests**:
|
| 108 |
+
- User can ask about person's expertise and get first-person responses
|
| 109 |
+
- Agent retrieves correct information from knowledge base
|
| 110 |
+
- Agent maintains authentic voice/persona throughout conversation
|
| 111 |
+
|
| 112 |
+
### RAG Tool & Agent Core (FR-002, FR-003, FR-004)
|
| 113 |
+
|
| 114 |
+
- [x] T026 [US1] Create `get_local_info_tool()` in `src/agent.py` that queries vectorstore for relevant documents
|
| 115 |
+
- [x] T027 [US1] Implement source attribution in RAG responses: format retrieved docs with links/references in `src/agent.py`
|
| 116 |
+
- [x] T028 [US1] Implement catch-all exception handler in `src/agent.py` that catches tool failures and returns user-friendly messages (FR-012)
|
| 117 |
+
|
| 118 |
+
### Agent Creation & Prompting (FR-003, Constitution VIII)
|
| 119 |
+
|
| 120 |
+
- [x] T029 [US1] Create `create_ai_me_agent()` method in `src/agent.py` with system prompt emphasizing first-person perspective and authentic voice
|
| 121 |
+
- [x] T030 [US1] Add error handling for agent initialization failures with session context logging
|
| 122 |
+
- [x] T031 [US1] Implement `run()` method in `src/agent.py` that executes agent with OpenAI Agents SDK and applies Unicode normalization (Constitution IX)
|
| 123 |
+
|
| 124 |
+
### Chat UI (FR-001, FR-005, FR-007)
|
| 125 |
+
|
| 126 |
+
- [x] T032 [US1] Create Gradio chat interface in `src/app.py` with message input and output display
|
| 127 |
+
- [x] T033 [US1] Implement session management in `src/app.py` using Gradio's `session_hash` for per-user isolation (FR-007)
|
| 128 |
+
- [x] T034 [US1] Create session-scoped agent instances dict in `src/app.py` keyed by session_id
|
| 129 |
+
- [x] T035 [US1] Implement conversation history storage in `src/app.py` within session (FR-005)
|
| 130 |
+
- [x] T036 [US1] Add error handling in chat endpoint to return friendly messages on failures (FR-012)
|
| 131 |
+
|
| 132 |
+
### Integration & Testing (US1 acceptance)
|
| 133 |
+
|
| 134 |
+
- [x] T037 [US1] Update `src/test.py` to add test for User Story 1: `test_user_story_1_chat_about_expertise()`
|
| 135 |
+
- [x] T038 [US1] [P] Add quality assertions to test: response in first-person, knowledge-grounded, authentic voice
|
| 136 |
+
- [x] T039 [US1] Manually verify: Open chat, ask "What is your experience with [topic]?", verify response is first-person and accurate
|
| 137 |
+
|
| 138 |
+
**Test Criteria** (User Story 1):
|
| 139 |
+
- ✅ `test_user_story_1_chat_about_expertise()` passes
|
| 140 |
+
- ✅ Chat interface responds to user input
|
| 141 |
+
- ✅ Responses use first-person perspective ("I built...", "My experience includes...")
|
| 142 |
+
- ✅ Responses cite sources from knowledge base
|
| 143 |
+
- ✅ Response time < 5 seconds (SC-005)
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## Phase 4: User Story 2 - Interact Across Multiple Conversation Topics (Priority: P2)
|
| 148 |
+
|
| 149 |
+
**Goal**: Enable multi-topic conversations with consistent persona.
|
| 150 |
+
**Timeline**: 2-3 days
|
| 151 |
+
**Dependencies**: Phase 3 (User Story 1) complete
|
| 152 |
+
**Independent Tests**:
|
| 153 |
+
- Agent maintains first-person perspective across multiple questions
|
| 154 |
+
- Agent shows topic awareness (different answers for different topics)
|
| 155 |
+
- Agent gracefully handles questions about undocumented topics
|
| 156 |
+
|
| 157 |
+
### Conversation Context (FR-005, FR-006)
|
| 158 |
+
|
| 159 |
+
- [x] T040 [US2] Enhance conversation history in `src/app.py` to include full message context per session
|
| 160 |
+
- [x] T041 [US2] Update agent prompt in `src/agent.py` to include conversation context for multi-turn awareness
|
| 161 |
+
- [x] T042 [US2] Implement graceful "knowledge gap" responses in agent prompt when documentation doesn't cover topic (FR-006)
|
| 162 |
+
- [x] T043 [US2] Add persona consistency assertions to agent prompt: maintain first-person, stay in character
|
| 163 |
+
- [x] T044 [US2] Create test `test_user_story_2_multi_topic_consistency()` that verifies consistent voice across 3+ questions on different topics
|
| 164 |
+
- [ ] T045 [US2] Manually verify: Ask 3+ questions on different topics, check that persona remains consistent
|
| 165 |
+
|
| 166 |
+
**Test Criteria** (User Story 2):
|
| 167 |
+
- ✅ `test_user_story_2_multi_topic_consistency()` passes
|
| 168 |
+
- ✅ All responses use consistent first-person perspective
|
| 169 |
+
- ✅ Knowledge gap questions return explicit "I don't have documentation on that" pattern
|
| 170 |
+
- ✅ Follow-up questions show conversation awareness
|
| 171 |
+
|
| 172 |
+
---
|
| 173 |
+
|
| 174 |
+
## Phase 5: User Story 3 - Access Sourced Information with Attribution (Priority: P2)
|
| 175 |
+
|
| 176 |
+
**Goal**: Implement and verify source attribution for all responses.
|
| 177 |
+
**Timeline**: 2-3 days
|
| 178 |
+
**Dependencies**: Phase 3 (User Story 1) + source attribution from T027 complete
|
| 179 |
+
**Independent Tests**:
|
| 180 |
+
- Responses contain source references (document names, links)
|
| 181 |
+
- Multiple sources get appropriate attribution
|
| 182 |
+
- Source links are valid and functional
|
| 183 |
+
|
| 184 |
+
### Source Attribution (FR-004)
|
| 185 |
+
|
| 186 |
+
- [x] T046 [US3] Enhance `get_local_info_tool()` in `src/agent.py` to include relevance scores and source metadata in response
|
| 187 |
+
- [x] T047 [US3] Format source information as inline citations in `src/agent.py` (e.g., "Per my resume..." or "As mentioned in my projects documentation...")
|
| 188 |
+
- [x] T048 [US3] Convert relative GitHub links to absolute URLs in source citations (verify Test 7: T013 prerequisite)
|
| 189 |
+
|
| 190 |
+
### Source Quality & Testing (SC-002)
|
| 191 |
+
|
| 192 |
+
- [x] T049 [US3] Create test `test_user_story_3_source_attribution()` that verifies all responses contain source references
|
| 193 |
+
- [ ] T050 [US3] Add assertion: sourced information exists in knowledge base (SC-002: 100% factual accuracy)
|
| 194 |
+
- [ ] T051 [US3] Manually verify: Ask 3+ questions, check that responses cite specific sources
|
| 195 |
+
|
| 196 |
+
**Test Criteria** (User Story 3):
|
| 197 |
+
- ✅ `test_user_story_3_source_attribution()` passes
|
| 198 |
+
- ✅ All responses include source document names/links
|
| 199 |
+
- ✅ Source links are valid (200 status for GitHub links)
|
| 200 |
+
- ✅ No response contains information not in knowledge base
|
| 201 |
+
|
| 202 |
+
---
|
| 203 |
+
|
| 204 |
+
## Phase 6: Cross-Cutting Concerns & Success Metrics
|
| 205 |
+
|
| 206 |
+
**Goal**: Implement measurement, logging, and monitoring for success criteria.
|
| 207 |
+
**Timeline**: 3-5 days
|
| 208 |
+
**Dependencies**: User Story 1-3 complete
|
| 209 |
+
|
| 210 |
+
### Success Metrics Framework (SC-001-SC-008)
|
| 211 |
+
|
| 212 |
+
- [x] T052 [P] Create Loki queries documentation in `docs/SUCCESS_METRICS.md` for measuring each SC criterion
|
| 213 |
+
- [x] T053 [P] Document Loki query for SC-005 (response latency): query logs for `timestamp_diff < 5000`
|
| 214 |
+
- [x] T054 [P] Document Loki query for SC-006 (concurrent users): count unique `session_id` values
|
| 215 |
+
- [x] T055 [P] Document Loki query for SC-007 (error handling): find logs with "user-friendly message" pattern
|
| 216 |
+
- [x] T056 Document Loki query for SC-001 (persona consistency): find first-person language patterns
|
| 217 |
+
- [x] T057 Document Loki query for SC-002 (factual accuracy): find responses with source attribution
|
| 218 |
+
- [x] T058 Document Loki query for SC-003 (in-scope answers): classify responses as "substantive" vs "knowledge gap"
|
| 219 |
+
- [x] T059 Document Loki query for SC-004 (knowledge gap handling): verify "I don't have documentation" pattern
|
| 220 |
+
|
| 221 |
+
### Conflict Detection & Logging (FR-011)
|
| 222 |
+
|
| 223 |
+
- [x] T060 Create conflict detection documentation in `docs/CONFLICT_DETECTION.md` (already exists; verify it's current)
|
| 224 |
+
- [x] T061 Document Loki query for finding conflicts: `message =~ "(?i)(conflicting|contradict|not sure|unclear)"`
|
| 225 |
+
- [x] T062 Add agent prompt guidance on acknowledging conflicts when sources disagree
|
| 226 |
+
|
| 227 |
+
### Error Message Quality (FR-012)
|
| 228 |
+
|
| 229 |
+
- [x] T063 Create test `test_tool_failure_error_messages_are_friendly()` in `src/test.py`
|
| 230 |
+
- [x] T064 Add assertions: error messages contain no Python tracebacks, no stack traces, human-readable language
|
| 231 |
+
- [x] T065 Verify all tool failures (GitHub rate limit, MCP timeout, etc.) have friendly error handling
|
| 232 |
+
|
| 233 |
+
### Load Testing (SC-006)
|
| 234 |
+
|
| 235 |
+
- [x] T066 Document manual session isolation testing in `TESTING.md` (browser-based, not pytest)
|
| 236 |
+
- [x] T067 Verify: Manual testing with 3+ concurrent browser tabs shows independent sessions
|
| 237 |
+
|
| 238 |
+
**Test Criteria** (Cross-Cutting):
|
| 239 |
+
- ✅ All success metric Loki queries documented
|
| 240 |
+
- ✅ `test_tool_failure_error_messages_are_friendly()` passes
|
| 241 |
+
- ✅ `test_concurrent_sessions_do_not_interfere()` passes with 5+ concurrent agents
|
| 242 |
+
- ✅ `docs/SUCCESS_METRICS.md` and `docs/CONFLICT_DETECTION.md` exist and have runnable queries
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
+
## Phase 7: Polish & Enhancements (Optional / Phase B)
|
| 247 |
+
|
| 248 |
+
**Goal**: Optional features and quality improvements.
|
| 249 |
+
**Timeline**: 1-2 weeks
|
| 250 |
+
**Dependencies**: Core features (Phases 1-6) complete
|
| 251 |
+
|
| 252 |
+
### LinkedIn Tool Integration (FR-010 LinkedIn)
|
| 253 |
+
|
| 254 |
+
- [x] T068 [DEFERRED] Research LinkedIn MCP server availability or LinkedIn API
|
| 255 |
+
- [x] T069 [DEFERRED] Create `get_mcp_linkedin_params()` function (conditional on LINKEDIN_API_TOKEN)
|
| 256 |
+
- [x] T070 [DEFERRED] Add LinkedIn tool initialization to `setup_mcp_servers()` in `src/agent.py`
|
| 257 |
+
- [x] T071 [DEFERRED] Create separate feature spec for LinkedIn via `/speckit.specify` (defer to own spec)
|
| 258 |
+
|
| 259 |
+
### UI Polish
|
| 260 |
+
|
| 261 |
+
- [x] T072 [DEFERRED] Review custom Gradio styling in `src/static/style.css`
|
| 262 |
+
- [x] T073 [DEFERRED] Enhance scroll behavior in `src/static/scroll.js` for better UX
|
| 263 |
+
- [x] T074 [DEFERRED] Add admin configuration UI for runtime document management
|
| 264 |
+
|
| 265 |
+
### Documentation & Quickstart (Phase 1 design artifacts)
|
| 266 |
+
|
| 267 |
+
- [x] T075 [DEFERRED] Create `docs/quickstart.md` with setup instructions, environment config, testing
|
| 268 |
+
- [x] T076 [DEFERRED] Create `specs/001-personified-ai-agent/data-model.md` with entity relationships
|
| 269 |
+
- [x] T077 [DEFERRED] Create `specs/001-personified-ai-agent/contracts/agent-api.yaml` with OpenAPI schema
|
| 270 |
+
|
| 271 |
+
---
|
| 272 |
+
|
| 273 |
+
## Task Dependencies & Execution Graph
|
| 274 |
+
|
| 275 |
+
### Critical Path (Blocking Dependencies)
|
| 276 |
+
|
| 277 |
+
```
|
| 278 |
+
T001-T007 (Setup)
|
| 279 |
+
↓
|
| 280 |
+
T008-T025 (Foundational: Document pipeline + Config + MCP)
|
| 281 |
+
↓
|
| 282 |
+
T026-T039 (User Story 1: Chat + RAG + UI)
|
| 283 |
+
├→ T040-T045 (User Story 2: Multi-topic)
|
| 284 |
+
└→ T046-T051 (User Story 3: Source Attribution)
|
| 285 |
+
↓
|
| 286 |
+
T052-T067 (Success Metrics & Load Testing)
|
| 287 |
+
↓
|
| 288 |
+
T068-T077 (Optional Enhancements)
|
| 289 |
+
```
|
| 290 |
+
|
| 291 |
+
### Parallelizable Tasks (Can Run Simultaneously)
|
| 292 |
+
|
| 293 |
+
**Phase 1**:
|
| 294 |
+
- T001, T002, T003, T004, T005, T007 (all independent)
|
| 295 |
+
- T006 (parallel with others)
|
| 296 |
+
|
| 297 |
+
**Phase 2**:
|
| 298 |
+
- T008, T009, T010, T011 (document loading - parallel)
|
| 299 |
+
- T016, T017 (config setup - parallel)
|
| 300 |
+
- T018, T019 (logging - parallel)
|
| 301 |
+
- T020, T021, T022 (MCP tool params - parallel)
|
| 302 |
+
- T023, T024, T025 (error handling - parallel)
|
| 303 |
+
|
| 304 |
+
**Phase 3**:
|
| 305 |
+
- T026, T027 (RAG tool - sequential within story)
|
| 306 |
+
- T029, T030, T031 (agent - sequential)
|
| 307 |
+
- T032, T033, T034, T035, T036 (UI - sequential)
|
| 308 |
+
|
| 309 |
+
**Phase 6**:
|
| 310 |
+
- T052-T059 (Loki query docs - parallel)
|
| 311 |
+
- T063, T064, T065 (error message tests - parallel)
|
| 312 |
+
|
| 313 |
+
---
|
| 314 |
+
|
| 315 |
+
## MVP Execution Path (Recommended)
|
| 316 |
+
|
| 317 |
+
### Week 1: Minimal Viable Product
|
| 318 |
+
1. **Phase 1** (Setup): T001-T007 — 1 day
|
| 319 |
+
2. **Phase 2** (Foundational): T008-T025 — 2-3 days
|
| 320 |
+
3. **Phase 3** (User Story 1): T026-T039 — 2 days
|
| 321 |
+
|
| 322 |
+
**Deliverable**: Functional chat interface with knowledge retrieval, first-person persona, session isolation
|
| 323 |
+
|
| 324 |
+
### Week 2: Complete Core Specification
|
| 325 |
+
4. **Phase 4** (User Story 2): T040-T045 — 1 day
|
| 326 |
+
5. **Phase 5** (User Story 3): T046-T051 — 1 day
|
| 327 |
+
6. **Phase 6** (Metrics): T052-T067 — 2 days
|
| 328 |
+
|
| 329 |
+
**Deliverable**: Full specification compliance + success metrics measurement + load testing
|
| 330 |
+
|
| 331 |
+
### Week 3+: Enhancements (Optional)
|
| 332 |
+
7. **Phase 7** (Polish): T068-T077 — 1-2 weeks
|
| 333 |
+
|
| 334 |
+
**Deliverable**: LinkedIn integration, UI polish, comprehensive documentation
|
| 335 |
+
|
| 336 |
+
---
|
| 337 |
+
|
| 338 |
+
## Quality Gates & Acceptance Criteria
|
| 339 |
+
|
| 340 |
+
### Before Merging to `main`
|
| 341 |
+
|
| 342 |
+
- [ ] All Phase 1-6 tests passing (`uv run pytest src/test.py -v`)
|
| 343 |
+
- [ ] All constitution principles still satisfied (re-check after implementation)
|
| 344 |
+
- [ ] Code follows PEP 8 style (imports organized, 98-char line limit)
|
| 345 |
+
- [ ] Notebooks synchronized with code changes (no function signature drift)
|
| 346 |
+
- [ ] Success metrics Loki queries documented and testable
|
| 347 |
+
- [ ] Error messages verified as user-friendly (T063, T064)
|
| 348 |
+
- [ ] Load testing passing (10+ concurrent sessions independent)
|
| 349 |
+
|
| 350 |
+
### Deployment Readiness (Hugging Face Spaces)
|
| 351 |
+
|
| 352 |
+
- [ ] Dockerfile builds successfully: `docker compose build`
|
| 353 |
+
- [ ] Environment variables documented: `.env` template created
|
| 354 |
+
- [ ] Performance targets met: <5 sec response time (SC-005)
|
| 355 |
+
- [ ] Rate limiting handled gracefully (GitHub API)
|
| 356 |
+
- [ ] Logging configured (console + optional Loki)
|
| 357 |
+
|
| 358 |
+
---
|
| 359 |
+
|
| 360 |
+
## Implementation Notes
|
| 361 |
+
|
| 362 |
+
### Key Technical Decisions
|
| 363 |
+
|
| 364 |
+
1. **ChromaDB Ephemeral Storage**: Rebuilt on restart. Stateless by design for Spaces deployment.
|
| 365 |
+
2. **MCP Servers Per-Session**: Time, Memory, GitHub all initialized per session for isolation.
|
| 366 |
+
3. **Temperature=0.0 for Tests, 1.0 Default**: Determinism in tests; natural responses in production.
|
| 367 |
+
4. **Loki for Observability**: All metrics extracted from logs; no separate telemetry system.
|
| 368 |
+
5. **GitHub URL Rewriting**: Relative links converted to absolute for proper attribution.
|
| 369 |
+
|
| 370 |
+
### Constitution Compliance
|
| 371 |
+
|
| 372 |
+
All tasks maintain compliance with project constitution:
|
| 373 |
+
- ✅ **I. Async-First**: All external I/O async (MCP setup, document loading)
|
| 374 |
+
- ✅ **II. RAG-First**: No hardcoded knowledge; all from retrieved documents
|
| 375 |
+
- ✅ **III. Type-Safe Config**: Pydantic BaseSettings + SecretStr
|
| 376 |
+
- ✅ **IV. Session Isolation**: Per-session agents + MCP servers
|
| 377 |
+
- ✅ **V. Test-First**: Each task includes test criteria
|
| 378 |
+
- ✅ **VI. Import Organization**: PEP 8 throughout
|
| 379 |
+
- ✅ **VII. Observability**: Structured logging + Loki integration
|
| 380 |
+
- ✅ **VIII. Persona Consistency**: First-person prompting maintained
|
| 381 |
+
- ✅ **IX. Output Cleanliness**: Unicode normalization applied
|
| 382 |
+
|
| 383 |
+
---
|
| 384 |
+
|
| 385 |
+
## Success Metrics (How We Know We're Done)
|
| 386 |
+
|
| 387 |
+
| Metric | Target | How We Measure |
|
| 388 |
+
|--------|--------|----------------|
|
| 389 |
+
| **Spec Compliance** | 100% | All FR, SC implemented or deferred |
|
| 390 |
+
| **Test Coverage** | 7+ tests passing | `uv run pytest src/test.py -v` |
|
| 391 |
+
| **Response Time** | <5 seconds | Loki query: SC-005 |
|
| 392 |
+
| **Concurrency** | 10+ independent sessions | Load test: T066 |
|
| 393 |
+
| **Error Handling** | 100% user-friendly | Error message test: T063 |
|
| 394 |
+
| **Source Attribution** | 100% of responses | Test 7 + T049 |
|
| 395 |
+
| **Persona Consistency** | First-person throughout | Manual review + T038 |
|
| 396 |
+
| **Knowledge Gap Handling** | 100% explicit indication | T044 + SC-004 verification |
|
| 397 |
+
|
| 398 |
+
---
|
| 399 |
+
|
| 400 |
+
## File Changes Summary
|
| 401 |
+
|
| 402 |
+
**New Files Created**:
|
| 403 |
+
- `src/agent.py` - AIMeAgent class, MCP setup, RAG tool, run method
|
| 404 |
+
- `src/app.py` - Gradio chat interface, session management
|
| 405 |
+
- `src/data.py` - Document pipeline, vectorstore setup
|
| 406 |
+
- `src/config.py` - Pydantic configuration, logging setup
|
| 407 |
+
- `src/test.py` - Integration tests (existing + enhancements)
|
| 408 |
+
- `docs/SUCCESS_METRICS.md` - Loki queries for SC-001-008
|
| 409 |
+
- `docs/CONFLICT_DETECTION.md` - Conflict detection design + queries
|
| 410 |
+
|
| 411 |
+
**Modified Files**:
|
| 412 |
+
- `src/notebooks/experiments.ipynb` - Development sandbox (keep in sync)
|
| 413 |
+
- `pyproject.toml` - Dependencies (already complete)
|
| 414 |
+
|
| 415 |
+
**Configuration Files**:
|
| 416 |
+
- `.env` - Environment variables (create from template)
|
| 417 |
+
- `Dockerfile` - Docker build (already exists)
|
| 418 |
+
- `docker-compose.yaml` - Local development (already exists)
|
| 419 |
+
|
| 420 |
+
---
|
| 421 |
+
|
| 422 |
+
**Plan Status**: ✅ Ready for execution
|
| 423 |
+
**Recommended Start**: Phase 1 today; Phase 2-3 in parallel
|
| 424 |
+
**Expected Completion**: 2-3 weeks for MVP + core specification
|
| 425 |
+
|
src/agent.py
CHANGED
|
@@ -148,6 +148,12 @@ these rules:
|
|
| 148 |
- Example: https://github.com/owner/repo/blob/main/filename.md
|
| 149 |
- Never use shorthand like: filename.md†L44-L53 or source†L44-L53
|
| 150 |
- Always strip out line number references
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
- Add reference links in a references section at the end of the output if they
|
| 152 |
match github.com
|
| 153 |
- Below are critical instructions for using your memory and GitHub tools
|
|
|
|
| 148 |
- Example: https://github.com/owner/repo/blob/main/filename.md
|
| 149 |
- Never use shorthand like: filename.md†L44-L53 or source†L44-L53
|
| 150 |
- Always strip out line number references
|
| 151 |
+
- CRITICAL: Include source citations in your response to establish credibility
|
| 152 |
+
and traceability. Format citations as:
|
| 153 |
+
- For GitHub sources: "Per my [document_name]..." or "As mentioned in [document_name]..."
|
| 154 |
+
- For local sources: "According to my documentation on [topic]..."
|
| 155 |
+
- Include the source URL in parentheses when available
|
| 156 |
+
- Example: "Per my resume (https://github.com/byoung/ai-me/blob/main/resume.md), I worked at..."
|
| 157 |
- Add reference links in a references section at the end of the output if they
|
| 158 |
match github.com
|
| 159 |
- Below are critical instructions for using your memory and GitHub tools
|
src/data.py
CHANGED
|
@@ -30,8 +30,8 @@ class DataManagerConfig(BaseModel):
|
|
| 30 |
github_repos: List[str] = Field(
|
| 31 |
default=[], description="List of GitHub repos (format: owner/repo)")
|
| 32 |
doc_root: str = Field(
|
| 33 |
-
default=os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "docs")) + "/",
|
| 34 |
-
description="Root directory for local documents")
|
| 35 |
chunk_size: int = Field(
|
| 36 |
default=2500, description="Character chunk size for splitting")
|
| 37 |
chunk_overlap: int = Field(
|
|
|
|
| 30 |
github_repos: List[str] = Field(
|
| 31 |
default=[], description="List of GitHub repos (format: owner/repo)")
|
| 32 |
doc_root: str = Field(
|
| 33 |
+
default=os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "docs", "local-testing")) + "/",
|
| 34 |
+
description="Root directory for local documents (development/testing only)")
|
| 35 |
chunk_size: int = Field(
|
| 36 |
default=2500, description="Character chunk size for splitting")
|
| 37 |
chunk_overlap: int = Field(
|
src/test.py
CHANGED
|
@@ -8,6 +8,7 @@ import re
|
|
| 8 |
import sys
|
| 9 |
import os
|
| 10 |
from datetime import datetime
|
|
|
|
| 11 |
|
| 12 |
# Something about these tests makes me feel yucky. Big, brittle, and slow. BBS?
|
| 13 |
# Couple ideas to make them better:
|
|
@@ -212,6 +213,213 @@ async def test_mcp_memory_server_remembers_favorite_color(ai_me_agent):
|
|
| 212 |
logger.info(msg)
|
| 213 |
|
| 214 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
if __name__ == "__main__":
|
| 216 |
# Allow running tests directly with python test.py
|
| 217 |
pytest.main([__file__, "-v", "-s"])
|
|
|
|
| 8 |
import sys
|
| 9 |
import os
|
| 10 |
from datetime import datetime
|
| 11 |
+
from unittest.mock import AsyncMock, patch
|
| 12 |
|
| 13 |
# Something about these tests makes me feel yucky. Big, brittle, and slow. BBS?
|
| 14 |
# Couple ideas to make them better:
|
|
|
|
| 213 |
logger.info(msg)
|
| 214 |
|
| 215 |
|
| 216 |
+
@pytest.mark.asyncio
|
| 217 |
+
async def test_github_relative_links_converted_to_absolute_urls():
|
| 218 |
+
"""Test 7: Verify that relative links in GitHub documents are converted to absolute GitHub URLs.
|
| 219 |
+
|
| 220 |
+
This test validates FR-004 (Source Attribution): that when documents are loaded from GitHub
|
| 221 |
+
with relative links (e.g., /resume.md), they are rewritten to full GitHub URLs
|
| 222 |
+
(e.g., https://github.com/owner/repo/blob/main/resume.md).
|
| 223 |
+
"""
|
| 224 |
+
from langchain_core.documents import Document
|
| 225 |
+
|
| 226 |
+
# Create a sample document as if it came from GitHub with relative links
|
| 227 |
+
sample_doc = Document(
|
| 228 |
+
page_content="Check out [my resume](/resume.md) and [projects](/projects.md) for more info.",
|
| 229 |
+
metadata={
|
| 230 |
+
"source": "github://byoung/ai-me/docs/about.md",
|
| 231 |
+
"github_repo": "byoung/ai-me"
|
| 232 |
+
}
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
# Initialize data manager to use process_documents
|
| 236 |
+
data_config = DataManagerConfig(github_repos=["byoung/ai-me"])
|
| 237 |
+
data_manager = DataManager(config=data_config)
|
| 238 |
+
|
| 239 |
+
# Process the document (applies URL rewriting)
|
| 240 |
+
processed_docs = data_manager.process_documents([sample_doc])
|
| 241 |
+
|
| 242 |
+
# Verify the content
|
| 243 |
+
assert len(processed_docs) == 1, "Expected 1 processed document"
|
| 244 |
+
processed_content = processed_docs[0].page_content
|
| 245 |
+
|
| 246 |
+
# Check that relative links have been converted to absolute GitHub URLs
|
| 247 |
+
assert "https://github.com/byoung/ai-me/blob/main/resume.md" in processed_content, (
|
| 248 |
+
f"Expected absolute GitHub URL for /resume.md in processed content, "
|
| 249 |
+
f"but got: {processed_content}"
|
| 250 |
+
)
|
| 251 |
+
assert "https://github.com/byoung/ai-me/blob/main/projects.md" in processed_content, (
|
| 252 |
+
f"Expected absolute GitHub URL for /projects.md in processed content, "
|
| 253 |
+
f"but got: {processed_content}"
|
| 254 |
+
)
|
| 255 |
+
|
| 256 |
+
logger.info("✓ Test passed: Relative GitHub links converted to absolute URLs")
|
| 257 |
+
logger.info(f" Original: [my resume](/resume.md)")
|
| 258 |
+
logger.info(f" Converted: [my resume](https://github.com/byoung/ai-me/blob/main/resume.md)")
|
| 259 |
+
|
| 260 |
+
|
| 261 |
+
@pytest.mark.asyncio
|
| 262 |
+
async def test_user_story_2_multi_topic_consistency(ai_me_agent):
|
| 263 |
+
"""
|
| 264 |
+
Test 8 (T044): User Story 2 - Multi-Topic Consistency
|
| 265 |
+
Verify that the agent maintains consistent first-person perspective
|
| 266 |
+
across multiple conversation topics.
|
| 267 |
+
|
| 268 |
+
This tests that the agent:
|
| 269 |
+
- Uses first-person perspective (I, my, me) consistently
|
| 270 |
+
- Maintains professional tone across different topic switches
|
| 271 |
+
- Shows context awareness of different topics
|
| 272 |
+
- Remains in-character as the personified individual
|
| 273 |
+
"""
|
| 274 |
+
# Ask 3 questions about different topics
|
| 275 |
+
topics = [
|
| 276 |
+
("What is your background in technology?", "background|experience|technology"),
|
| 277 |
+
("Tell me about your current work at Neosofia", "Neosofia|current|employer"),
|
| 278 |
+
("What programming languages are you skilled in?", "programming|language|skilled"),
|
| 279 |
+
]
|
| 280 |
+
|
| 281 |
+
first_person_patterns = [
|
| 282 |
+
r"\bi\b", r"\bme\b", r"\bmy\b", r"\bmyself\b",
|
| 283 |
+
r"\bI['m]", r"\bI['ve]", r"\bI['ll]"
|
| 284 |
+
]
|
| 285 |
+
|
| 286 |
+
for question, topic_keywords in topics:
|
| 287 |
+
logger.info(f"\n{'='*60}\nMulti-topic test question: {question}\n{'='*60}")
|
| 288 |
+
|
| 289 |
+
response = await ai_me_agent.run(question)
|
| 290 |
+
response_lower = response.lower()
|
| 291 |
+
|
| 292 |
+
# Check for first-person usage
|
| 293 |
+
first_person_found = any(
|
| 294 |
+
re.search(pattern, response, re.IGNORECASE)
|
| 295 |
+
for pattern in first_person_patterns
|
| 296 |
+
)
|
| 297 |
+
assert first_person_found, (
|
| 298 |
+
f"Expected first-person perspective in response to '{question}' "
|
| 299 |
+
f"but got: {response}"
|
| 300 |
+
)
|
| 301 |
+
|
| 302 |
+
# Verify response is substantive (not just "I don't know")
|
| 303 |
+
min_length = 50 # Substantive responses should be > 50 chars
|
| 304 |
+
assert len(response) > min_length, (
|
| 305 |
+
f"Response to '{question}' was too short (likely not substantive): {response}"
|
| 306 |
+
)
|
| 307 |
+
|
| 308 |
+
logger.info(f"✓ First-person perspective maintained for: {question[:40]}...")
|
| 309 |
+
logger.info(f" Response preview: {response[:100]}...")
|
| 310 |
+
|
| 311 |
+
logger.info("\n✓ Test passed: Consistent first-person perspective across 3+ topics")
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
@pytest.mark.asyncio
|
| 315 |
+
async def test_user_story_3_source_attribution(ai_me_agent):
|
| 316 |
+
"""
|
| 317 |
+
Test 9 (T049): User Story 3 - Source Attribution
|
| 318 |
+
Verify that all responses contain source references/attribution.
|
| 319 |
+
|
| 320 |
+
This tests that the agent:
|
| 321 |
+
- Includes source document references in responses
|
| 322 |
+
- Links to knowledge base documents (GitHub URLs or local sources)
|
| 323 |
+
- Provides verifiable, traceable information
|
| 324 |
+
- Maintains SC-002: 100% factual accuracy through sourcing
|
| 325 |
+
"""
|
| 326 |
+
# Ask 3 questions that should retrieve documented knowledge
|
| 327 |
+
questions = [
|
| 328 |
+
"What do you know about ReaR?",
|
| 329 |
+
"Do you know Carol?",
|
| 330 |
+
"Tell me about your experience in technology",
|
| 331 |
+
]
|
| 332 |
+
|
| 333 |
+
# Pattern to find source references: URLs, "source:" labels, or GitHub links
|
| 334 |
+
source_patterns = [
|
| 335 |
+
r"https://github\.com/", # GitHub URLs
|
| 336 |
+
r"source:", # Explicit source labels
|
| 337 |
+
r"\[.*\]\(https?://", # Markdown links
|
| 338 |
+
r"documentation", # Reference to documentation
|
| 339 |
+
]
|
| 340 |
+
|
| 341 |
+
for question in questions:
|
| 342 |
+
logger.info(f"\n{'='*60}\nSource attribution test: {question}\n{'='*60}")
|
| 343 |
+
|
| 344 |
+
response = await ai_me_agent.run(question)
|
| 345 |
+
|
| 346 |
+
# Check for at least one source reference pattern
|
| 347 |
+
has_source = any(
|
| 348 |
+
re.search(pattern, response, re.IGNORECASE)
|
| 349 |
+
for pattern in source_patterns
|
| 350 |
+
)
|
| 351 |
+
assert has_source, (
|
| 352 |
+
f"Expected source attribution in response to '{question}' "
|
| 353 |
+
f"but found none. Response: {response}"
|
| 354 |
+
)
|
| 355 |
+
|
| 356 |
+
# Verify response is substantive (not just metadata)
|
| 357 |
+
min_length = 50
|
| 358 |
+
assert len(response) > min_length, (
|
| 359 |
+
f"Response to '{question}' was too short: {response}"
|
| 360 |
+
)
|
| 361 |
+
|
| 362 |
+
logger.info(f"✓ Source attribution found for: {question[:40]}...")
|
| 363 |
+
logger.info(f" Response includes source/reference")
|
| 364 |
+
|
| 365 |
+
logger.info("\n✓ Test passed: All responses include source attribution (SC-002)")
|
| 366 |
+
|
| 367 |
+
|
| 368 |
+
@pytest.mark.asyncio
|
| 369 |
+
async def test_tool_failure_error_messages_are_friendly(caplog, ai_me_agent):
|
| 370 |
+
"""
|
| 371 |
+
Test 10 (T063-T065): Error Message Quality (FR-012)
|
| 372 |
+
Verify that tool failures return user-friendly messages without Python tracebacks.
|
| 373 |
+
|
| 374 |
+
This tests that the agent:
|
| 375 |
+
- Returns human-readable error messages
|
| 376 |
+
- logs an error that can be reviewed in our dashboard/logs
|
| 377 |
+
|
| 378 |
+
Uses mocking to simulate tool failures without adding test-specific code to agent.py
|
| 379 |
+
"""
|
| 380 |
+
logger.info(f"\n{'='*60}\nError Handling Test\n{'='*60}")
|
| 381 |
+
|
| 382 |
+
# Mock the Runner.run method to simulate a tool failure
|
| 383 |
+
# This tests the catch-all exception handler without adding test code to production
|
| 384 |
+
test_scenarios = [
|
| 385 |
+
RuntimeError("Simulated tool timeout"),
|
| 386 |
+
ValueError("Invalid tool parameters"),
|
| 387 |
+
]
|
| 388 |
+
|
| 389 |
+
for error in test_scenarios:
|
| 390 |
+
logger.info(f"\nTesting error scenario: {error.__class__.__name__}: {error}")
|
| 391 |
+
|
| 392 |
+
# Clear previous log records for this iteration
|
| 393 |
+
caplog.clear()
|
| 394 |
+
|
| 395 |
+
# Mock Runner.run to raise an exception
|
| 396 |
+
with patch('agent.Runner.run', new_callable=AsyncMock) as mock_run:
|
| 397 |
+
mock_run.side_effect = error
|
| 398 |
+
|
| 399 |
+
response = await ai_me_agent.run("Any user question")
|
| 400 |
+
|
| 401 |
+
logger.info(f"Response: {response[:100]}...")
|
| 402 |
+
|
| 403 |
+
# PRIMARY CHECK: Verify "I encountered an unexpected error" is in response
|
| 404 |
+
assert "I encountered an unexpected error" in response, (
|
| 405 |
+
f"Response must contain 'I encountered an unexpected error'. Got: {response}"
|
| 406 |
+
)
|
| 407 |
+
|
| 408 |
+
# SECONDARY CHECK: Verify error was logged by agent.py
|
| 409 |
+
error_logs = [record for record in caplog.records if record.levelname == "ERROR"]
|
| 410 |
+
assert len(error_logs) > 0, "Expected at least one ERROR log record from agent.py"
|
| 411 |
+
|
| 412 |
+
# Find the agent.py error log (contains "Unexpected error:")
|
| 413 |
+
agent_error_logged = any("Unexpected error:" in record.message for record in error_logs)
|
| 414 |
+
assert agent_error_logged, (
|
| 415 |
+
f"Expected ERROR log with 'Unexpected error:' from agent.py. "
|
| 416 |
+
f"Got: {[r.message for r in error_logs]}"
|
| 417 |
+
)
|
| 418 |
+
logger.info(f"✓ Error properly logged to logger: {[r.message for r in error_logs if 'Unexpected error:' in r.message]}")
|
| 419 |
+
|
| 420 |
+
logger.info("\n✓ Test passed: Error messages are friendly (FR-012) + properly logged")
|
| 421 |
+
|
| 422 |
+
|
| 423 |
if __name__ == "__main__":
|
| 424 |
# Allow running tests directly with python test.py
|
| 425 |
pytest.main([__file__, "-v", "-s"])
|