Spaces:
Runtime error
Runtime error
Commit ·
7c8fa1c
1
Parent(s): 737f100
Add reward scoring and context-aware code review flow
Browse files- DEMO_SCRIPT.md +8 -8
- README.md +41 -13
- server/demo.py +44 -23
- tests/test_triage_pipeline.py +4 -2
- triage.py +90 -24
- triage_catalog.py +17 -0
- triage_models.py +6 -0
DEMO_SCRIPT.md
CHANGED
|
@@ -2,11 +2,11 @@
|
|
| 2 |
|
| 3 |
## 60-90 Second Walkthrough
|
| 4 |
|
| 5 |
-
1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered
|
| 6 |
-
2. Point to the
|
| 7 |
-
3. Select the `Fix the invoice total syntax regression` example to show the app loading a
|
| 8 |
-
4. Highlight the **Live Triage Radar**
|
| 9 |
-
5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known
|
| 10 |
-
6. Scroll to the
|
| 11 |
-
7. Switch to the performance example to show the confidence profile
|
| 12 |
-
8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo
|
|
|
|
| 2 |
|
| 3 |
## 60-90 Second Walkthrough
|
| 4 |
|
| 5 |
+
1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered code review and improvement system built with PyTorch.
|
| 6 |
+
2. Point to the problem statement: manual code review is slow, inconsistent, and hard to scale.
|
| 7 |
+
3. Select the `Fix the invoice total syntax regression` example to show the app loading a broken code sample together with the context window.
|
| 8 |
+
4. Highlight the **Live Triage Radar**, the ML quality score, and the RL-ready reward score.
|
| 9 |
+
5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known code-quality patterns from the OpenEnv task catalog.
|
| 10 |
+
6. Scroll to the three-step improvement plan and call out the progression: syntax and bug fixes, edge cases, then scalability.
|
| 11 |
+
7. Switch to the performance example to show the confidence profile and reward changing for a different class of issue.
|
| 12 |
+
8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo remains grounded in measurable task outcomes.
|
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
title: TorchReview Copilot
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: orange
|
| 5 |
colorTo: red
|
| 6 |
sdk: docker
|
|
@@ -16,7 +16,7 @@ tags:
|
|
| 16 |
|
| 17 |
# TorchReview Copilot
|
| 18 |
|
| 19 |
-
TorchReview Copilot is an **AI-powered
|
| 20 |
|
| 21 |
It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
|
| 22 |
|
|
@@ -35,13 +35,14 @@ That triage step is repetitive, error-prone, and often slows down the actual fix
|
|
| 35 |
|
| 36 |
## Solution
|
| 37 |
|
| 38 |
-
TorchReview Copilot turns code
|
| 39 |
|
| 40 |
- **Issue classification:** syntax, logic, or performance
|
| 41 |
-
- **
|
|
|
|
| 42 |
- **Live Triage Radar:** confidence visualization for all issue classes
|
| 43 |
- **Nearest known pattern:** the closest OpenEnv task match
|
| 44 |
-
- **
|
| 45 |
|
| 46 |
The result is a demo that feels like a real AI debugging assistant rather than a backend-only environment.
|
| 47 |
|
|
@@ -54,13 +55,13 @@ This project uses **PyTorch for real inference**, not placeholder branching:
|
|
| 54 |
- embeddings are compared against curated OpenEnv issue prototypes
|
| 55 |
- the final decision blends model similarity with lightweight static analysis signals
|
| 56 |
|
| 57 |
-
That gives the demo an actual model-backed
|
| 58 |
|
| 59 |
## How It Works
|
| 60 |
|
| 61 |
### Pipeline
|
| 62 |
|
| 63 |
-
`Input code + traceback -> static checks -> PyTorch embeddings ->
|
| 64 |
|
| 65 |
### Detailed Flow
|
| 66 |
|
|
@@ -68,16 +69,28 @@ That gives the demo an actual model-backed classification path while keeping it
|
|
| 68 |
2. TorchReview extracts lightweight static signals:
|
| 69 |
- parser success/failure
|
| 70 |
- assertion-style test language
|
| 71 |
-
-
|
| 72 |
-
- nested-loop depth
|
| 73 |
3. CodeBERTa runs through PyTorch to embed the combined input.
|
| 74 |
-
4. The embedding is compared against built-in issue prototypes derived from the OpenEnv task catalog.
|
| 75 |
5. The UI returns:
|
| 76 |
- top issue label
|
| 77 |
- confidence radar
|
| 78 |
- repair risk
|
|
|
|
|
|
|
| 79 |
- nearest known bug pattern
|
| 80 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## Built-In Demo Scenarios
|
| 83 |
|
|
@@ -98,6 +111,18 @@ These examples make the classification differences obvious during judging and vi
|
|
| 98 |
- **OpenEnv** for deterministic validation endpoints and environment compatibility
|
| 99 |
- **Pydantic** for typed schemas
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
## Hugging Face Space UX
|
| 102 |
|
| 103 |
The root app now presents a production-style triage experience:
|
|
@@ -105,8 +130,10 @@ The root app now presents a production-style triage experience:
|
|
| 105 |
- a clear problem/solution hero section
|
| 106 |
- example scenario selector
|
| 107 |
- code and traceback inputs
|
|
|
|
| 108 |
- **Live Triage Radar**
|
| 109 |
-
- structured
|
|
|
|
| 110 |
- visible model/backend notes
|
| 111 |
|
| 112 |
The underlying OpenEnv endpoints remain available for compatibility and evaluation.
|
|
@@ -209,7 +236,8 @@ Short version:
|
|
| 209 |
3. Show the Live Triage Radar and issue label.
|
| 210 |
4. Explain the PyTorch embedding step.
|
| 211 |
5. Show the matched pattern and fix plan.
|
| 212 |
-
6.
|
|
|
|
| 213 |
|
| 214 |
## Limitations
|
| 215 |
|
|
|
|
| 1 |
---
|
| 2 |
title: TorchReview Copilot
|
| 3 |
+
emoji: 🧠
|
| 4 |
colorFrom: orange
|
| 5 |
colorTo: red
|
| 6 |
sdk: docker
|
|
|
|
| 16 |
|
| 17 |
# TorchReview Copilot
|
| 18 |
|
| 19 |
+
TorchReview Copilot is an **AI-powered code review and improvement system using PyTorch** to analyze Python code, predict quality, generate structured improvement suggestions, and compute an RL-ready reward score.
|
| 20 |
|
| 21 |
It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
|
| 22 |
|
|
|
|
| 35 |
|
| 36 |
## Solution
|
| 37 |
|
| 38 |
+
TorchReview Copilot turns code, traceback text, and a short context window into a practical code-review report:
|
| 39 |
|
| 40 |
- **Issue classification:** syntax, logic, or performance
|
| 41 |
+
- **ML quality score:** predicted code quality from PyTorch embeddings
|
| 42 |
+
- **Reward score:** RL-ready score from model quality, lint quality, and complexity penalty
|
| 43 |
- **Live Triage Radar:** confidence visualization for all issue classes
|
| 44 |
- **Nearest known pattern:** the closest OpenEnv task match
|
| 45 |
+
- **Improvement plan:** step 1 syntax/bug fixes, step 2 edge cases, step 3 scalability
|
| 46 |
|
| 47 |
The result is a demo that feels like a real AI debugging assistant rather than a backend-only environment.
|
| 48 |
|
|
|
|
| 55 |
- embeddings are compared against curated OpenEnv issue prototypes
|
| 56 |
- the final decision blends model similarity with lightweight static analysis signals
|
| 57 |
|
| 58 |
+
That gives the demo an actual model-backed quality and issue scoring path while keeping it CPU-friendly for Hugging Face Spaces.
|
| 59 |
|
| 60 |
## How It Works
|
| 61 |
|
| 62 |
### Pipeline
|
| 63 |
|
| 64 |
+
`Input code + context window + traceback -> static checks -> PyTorch embeddings -> quality + issue prediction -> suggestion engine -> reward computation -> UI/API output`
|
| 65 |
|
| 66 |
### Detailed Flow
|
| 67 |
|
|
|
|
| 69 |
2. TorchReview extracts lightweight static signals:
|
| 70 |
- parser success/failure
|
| 71 |
- assertion-style test language
|
| 72 |
+
- lint/style issues
|
| 73 |
+
- nested-loop depth and complexity pressure
|
| 74 |
3. CodeBERTa runs through PyTorch to embed the combined input.
|
| 75 |
+
4. The embedding is compared against built-in issue prototypes derived from the OpenEnv task catalog and reference implementations.
|
| 76 |
5. The UI returns:
|
| 77 |
- top issue label
|
| 78 |
- confidence radar
|
| 79 |
- repair risk
|
| 80 |
+
- ML quality score
|
| 81 |
+
- RL-ready reward score
|
| 82 |
- nearest known bug pattern
|
| 83 |
+
- three-step improvement plan
|
| 84 |
+
|
| 85 |
+
### Reward Formula
|
| 86 |
+
|
| 87 |
+
The current reward computation is:
|
| 88 |
+
|
| 89 |
+
```text
|
| 90 |
+
reward = (0.5 x ML_quality_score) + (0.3 x lint_score) - (0.2 x complexity_penalty)
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
This keeps the project compatible with OpenEnv-style reinforcement learning workflows.
|
| 94 |
|
| 95 |
## Built-In Demo Scenarios
|
| 96 |
|
|
|
|
| 111 |
- **OpenEnv** for deterministic validation endpoints and environment compatibility
|
| 112 |
- **Pydantic** for typed schemas
|
| 113 |
|
| 114 |
+
## Features
|
| 115 |
+
|
| 116 |
+
- PyTorch-powered code quality inference
|
| 117 |
+
- Static analysis for syntax, lint, and complexity
|
| 118 |
+
- Context-window-aware review flow
|
| 119 |
+
- RL-ready reward shaping
|
| 120 |
+
- Live Triage Radar visualization
|
| 121 |
+
- Three-step improvement plan:
|
| 122 |
+
1. syntax checking and bug fixes
|
| 123 |
+
2. edge-case handling
|
| 124 |
+
3. scalability improvements
|
| 125 |
+
|
| 126 |
## Hugging Face Space UX
|
| 127 |
|
| 128 |
The root app now presents a production-style triage experience:
|
|
|
|
| 130 |
- a clear problem/solution hero section
|
| 131 |
- example scenario selector
|
| 132 |
- code and traceback inputs
|
| 133 |
+
- context window input
|
| 134 |
- **Live Triage Radar**
|
| 135 |
+
- structured improvement plan
|
| 136 |
+
- reward and quality score display
|
| 137 |
- visible model/backend notes
|
| 138 |
|
| 139 |
The underlying OpenEnv endpoints remain available for compatibility and evaluation.
|
|
|
|
| 236 |
3. Show the Live Triage Radar and issue label.
|
| 237 |
4. Explain the PyTorch embedding step.
|
| 238 |
5. Show the matched pattern and fix plan.
|
| 239 |
+
6. Show the reward score and explain how it can be used inside an RL environment.
|
| 240 |
+
7. Switch to the performance example to prove the model distinguishes issue classes.
|
| 241 |
|
| 242 |
## Limitations
|
| 243 |
|
server/demo.py
CHANGED
|
@@ -189,7 +189,7 @@ def _default_outputs() -> tuple[str, str, str, str, str]:
|
|
| 189 |
return (
|
| 190 |
"<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
|
| 191 |
"<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
|
| 192 |
-
"###
|
| 193 |
"### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
|
| 194 |
"### Model Notes\nBackend and extracted signal details will appear here.",
|
| 195 |
)
|
|
@@ -209,19 +209,31 @@ def _summary_html(result) -> str:
|
|
| 209 |
<span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
|
| 210 |
</div>
|
| 211 |
<p class="hero-copy">{summary}</p>
|
| 212 |
-
|
| 213 |
<div class="summary-stat">
|
| 214 |
-
<strong>
|
| 215 |
-
{
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
</div>
|
| 217 |
<div class="summary-stat">
|
| 218 |
-
<strong>
|
| 219 |
-
{result.matched_pattern.
|
| 220 |
</div>
|
| 221 |
<div class="summary-stat">
|
| 222 |
<strong>Inference Backend</strong>
|
| 223 |
{escape(result.model_backend)}
|
| 224 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
<div class="summary-stat">
|
| 226 |
<strong>Next Action</strong>
|
| 227 |
{next_action}
|
|
@@ -264,7 +276,7 @@ def _radar_html(result) -> str:
|
|
| 264 |
def _plan_markdown(result) -> str:
|
| 265 |
plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
|
| 266 |
return (
|
| 267 |
-
"###
|
| 268 |
f"**Primary issue:** `{result.issue_label}`\n\n"
|
| 269 |
f"{plan_lines}\n\n"
|
| 270 |
f"**Suggested next action:** {result.suggested_next_action}"
|
|
@@ -292,6 +304,9 @@ def _model_markdown(result) -> str:
|
|
| 292 |
f"- **Model backend:** `{result.model_backend}`\n"
|
| 293 |
f"- **Model id:** `{result.model_id}`\n"
|
| 294 |
f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
|
|
|
|
|
|
|
|
|
|
| 295 |
"### Extracted Signals\n"
|
| 296 |
f"{signal_lines}\n\n"
|
| 297 |
"### Backend Notes\n"
|
|
@@ -299,10 +314,10 @@ def _model_markdown(result) -> str:
|
|
| 299 |
)
|
| 300 |
|
| 301 |
|
| 302 |
-
def analyze_inputs(code: str, traceback_text: str) -> tuple[str, str, str, str, str]:
|
| 303 |
"""Run the triage engine and format outputs for the Gradio UI."""
|
| 304 |
|
| 305 |
-
result = get_default_engine().triage(code or "", traceback_text or "")
|
| 306 |
return (
|
| 307 |
_summary_html(result),
|
| 308 |
_radar_html(result),
|
|
@@ -312,18 +327,18 @@ def analyze_inputs(code: str, traceback_text: str) -> tuple[str, str, str, str,
|
|
| 312 |
)
|
| 313 |
|
| 314 |
|
| 315 |
-
def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str]:
|
| 316 |
"""Populate the UI from a built-in example and immediately analyze it."""
|
| 317 |
|
| 318 |
example = get_default_engine().example_map()[example_key]
|
| 319 |
-
outputs = analyze_inputs(example.code, example.traceback_text)
|
| 320 |
header = (
|
| 321 |
f"### Example Scenario\n"
|
| 322 |
f"**{example.title}** \n"
|
| 323 |
f"{example.summary} \n"
|
| 324 |
f"Label target: `{example.label}`"
|
| 325 |
)
|
| 326 |
-
return (example.code, example.traceback_text, header, *outputs)
|
| 327 |
|
| 328 |
|
| 329 |
def build_demo() -> gr.Blocks:
|
|
@@ -339,8 +354,8 @@ def build_demo() -> gr.Blocks:
|
|
| 339 |
<div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
|
| 340 |
<h1 class="hero-title">TorchReview Copilot</h1>
|
| 341 |
<p class="hero-copy">
|
| 342 |
-
AI-powered
|
| 343 |
-
and
|
| 344 |
</p>
|
| 345 |
</div>
|
| 346 |
"""
|
|
@@ -367,8 +382,14 @@ def build_demo() -> gr.Blocks:
|
|
| 367 |
label="Optional traceback / failing test output",
|
| 368 |
placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
|
| 369 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
with gr.Row():
|
| 371 |
-
analyze_button = gr.Button("Analyze
|
| 372 |
clear_button = gr.Button("Clear Inputs", variant="secondary")
|
| 373 |
|
| 374 |
with gr.Column(scale=5):
|
|
@@ -384,9 +405,9 @@ def build_demo() -> gr.Blocks:
|
|
| 384 |
<div class="eyebrow">How It Works</div>
|
| 385 |
<div class="how-grid">
|
| 386 |
<div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
|
| 387 |
-
<div class="how-step"><strong>Processing</strong><br>Static checks extract parser,
|
| 388 |
-
<div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and
|
| 389 |
-
<div class="how-step"><strong>Output</strong><br>Confidence radar,
|
| 390 |
</div>
|
| 391 |
</div>
|
| 392 |
"""
|
|
@@ -395,25 +416,25 @@ def build_demo() -> gr.Blocks:
|
|
| 395 |
example_choice.change(
|
| 396 |
fn=load_example,
|
| 397 |
inputs=example_choice,
|
| 398 |
-
outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 399 |
show_progress="hidden",
|
| 400 |
)
|
| 401 |
analyze_button.click(
|
| 402 |
fn=analyze_inputs,
|
| 403 |
-
inputs=[code_input, traceback_input],
|
| 404 |
outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 405 |
show_progress="minimal",
|
| 406 |
)
|
| 407 |
clear_button.click(
|
| 408 |
-
fn=lambda: ("", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
|
| 409 |
inputs=None,
|
| 410 |
-
outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 411 |
show_progress="hidden",
|
| 412 |
)
|
| 413 |
demo.load(
|
| 414 |
fn=load_example,
|
| 415 |
inputs=example_choice,
|
| 416 |
-
outputs=[code_input, traceback_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 417 |
show_progress="hidden",
|
| 418 |
)
|
| 419 |
|
|
|
|
| 189 |
return (
|
| 190 |
"<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
|
| 191 |
"<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
|
| 192 |
+
"### Improvement Plan\nAnalyze a sample to generate syntax, edge-case, and scalability recommendations.",
|
| 193 |
"### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
|
| 194 |
"### Model Notes\nBackend and extracted signal details will appear here.",
|
| 195 |
)
|
|
|
|
| 209 |
<span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
|
| 210 |
</div>
|
| 211 |
<p class="hero-copy">{summary}</p>
|
| 212 |
+
<div class="summary-grid">
|
| 213 |
<div class="summary-stat">
|
| 214 |
+
<strong>Reward Score</strong>
|
| 215 |
+
{result.reward_score:.0%}
|
| 216 |
+
</div>
|
| 217 |
+
<div class="summary-stat">
|
| 218 |
+
<strong>ML Quality</strong>
|
| 219 |
+
{result.ml_quality_score:.0%}
|
| 220 |
</div>
|
| 221 |
<div class="summary-stat">
|
| 222 |
+
<strong>Matched Pattern</strong>
|
| 223 |
+
{escape(result.matched_pattern.title)}
|
| 224 |
</div>
|
| 225 |
<div class="summary-stat">
|
| 226 |
<strong>Inference Backend</strong>
|
| 227 |
{escape(result.model_backend)}
|
| 228 |
</div>
|
| 229 |
+
<div class="summary-stat">
|
| 230 |
+
<strong>Lint Score</strong>
|
| 231 |
+
{result.lint_score:.0%}
|
| 232 |
+
</div>
|
| 233 |
+
<div class="summary-stat">
|
| 234 |
+
<strong>Complexity Penalty</strong>
|
| 235 |
+
{result.complexity_penalty:.0%}
|
| 236 |
+
</div>
|
| 237 |
<div class="summary-stat">
|
| 238 |
<strong>Next Action</strong>
|
| 239 |
{next_action}
|
|
|
|
| 276 |
def _plan_markdown(result) -> str:
|
| 277 |
plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
|
| 278 |
return (
|
| 279 |
+
"### Improvement Plan\n"
|
| 280 |
f"**Primary issue:** `{result.issue_label}`\n\n"
|
| 281 |
f"{plan_lines}\n\n"
|
| 282 |
f"**Suggested next action:** {result.suggested_next_action}"
|
|
|
|
| 304 |
f"- **Model backend:** `{result.model_backend}`\n"
|
| 305 |
f"- **Model id:** `{result.model_id}`\n"
|
| 306 |
f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
|
| 307 |
+
"### Reward Formula\n"
|
| 308 |
+
f"- `reward = (0.5 x {result.ml_quality_score:.2f}) + (0.3 x {result.lint_score:.2f}) - (0.2 x {result.complexity_penalty:.2f})`\n"
|
| 309 |
+
f"- **Final reward:** `{result.reward_score:.2f}`\n\n"
|
| 310 |
"### Extracted Signals\n"
|
| 311 |
f"{signal_lines}\n\n"
|
| 312 |
"### Backend Notes\n"
|
|
|
|
| 314 |
)
|
| 315 |
|
| 316 |
|
| 317 |
+
def analyze_inputs(code: str, traceback_text: str, context_window: str) -> tuple[str, str, str, str, str]:
|
| 318 |
"""Run the triage engine and format outputs for the Gradio UI."""
|
| 319 |
|
| 320 |
+
result = get_default_engine().triage(code or "", traceback_text or "", context_window or "")
|
| 321 |
return (
|
| 322 |
_summary_html(result),
|
| 323 |
_radar_html(result),
|
|
|
|
| 327 |
)
|
| 328 |
|
| 329 |
|
| 330 |
+
def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str, str]:
|
| 331 |
"""Populate the UI from a built-in example and immediately analyze it."""
|
| 332 |
|
| 333 |
example = get_default_engine().example_map()[example_key]
|
| 334 |
+
outputs = analyze_inputs(example.code, example.traceback_text, example.context_window)
|
| 335 |
header = (
|
| 336 |
f"### Example Scenario\n"
|
| 337 |
f"**{example.title}** \n"
|
| 338 |
f"{example.summary} \n"
|
| 339 |
f"Label target: `{example.label}`"
|
| 340 |
)
|
| 341 |
+
return (example.code, example.traceback_text, example.context_window, header, *outputs)
|
| 342 |
|
| 343 |
|
| 344 |
def build_demo() -> gr.Blocks:
|
|
|
|
| 354 |
<div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
|
| 355 |
<h1 class="hero-title">TorchReview Copilot</h1>
|
| 356 |
<p class="hero-copy">
|
| 357 |
+
AI-powered code review and improvement system using PyTorch to score code quality, surface bugs,
|
| 358 |
+
and generate a three-step improvement plan. OpenEnv stays underneath as the deterministic validation engine.
|
| 359 |
</p>
|
| 360 |
</div>
|
| 361 |
"""
|
|
|
|
| 382 |
label="Optional traceback / failing test output",
|
| 383 |
placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
|
| 384 |
)
|
| 385 |
+
context_input = gr.Textbox(
|
| 386 |
+
value=first_example.context_window,
|
| 387 |
+
lines=4,
|
| 388 |
+
label="Context window",
|
| 389 |
+
placeholder="Describe expected behavior, constraints, or repository context.",
|
| 390 |
+
)
|
| 391 |
with gr.Row():
|
| 392 |
+
analyze_button = gr.Button("Analyze & Score Code", variant="primary")
|
| 393 |
clear_button = gr.Button("Clear Inputs", variant="secondary")
|
| 394 |
|
| 395 |
with gr.Column(scale=5):
|
|
|
|
| 405 |
<div class="eyebrow">How It Works</div>
|
| 406 |
<div class="how-grid">
|
| 407 |
<div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
|
| 408 |
+
<div class="how-step"><strong>Processing</strong><br>Static checks extract parser, lint, complexity, and runtime clues.</div>
|
| 409 |
+
<div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and score code quality against known OpenEnv patterns.</div>
|
| 410 |
+
<div class="how-step"><strong>Output</strong><br>Confidence radar, reward score, and a three-step improvement plan.</div>
|
| 411 |
</div>
|
| 412 |
</div>
|
| 413 |
"""
|
|
|
|
| 416 |
example_choice.change(
|
| 417 |
fn=load_example,
|
| 418 |
inputs=example_choice,
|
| 419 |
+
outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 420 |
show_progress="hidden",
|
| 421 |
)
|
| 422 |
analyze_button.click(
|
| 423 |
fn=analyze_inputs,
|
| 424 |
+
inputs=[code_input, traceback_input, context_input],
|
| 425 |
outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 426 |
show_progress="minimal",
|
| 427 |
)
|
| 428 |
clear_button.click(
|
| 429 |
+
fn=lambda: ("", "", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
|
| 430 |
inputs=None,
|
| 431 |
+
outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 432 |
show_progress="hidden",
|
| 433 |
)
|
| 434 |
demo.load(
|
| 435 |
fn=load_example,
|
| 436 |
inputs=example_choice,
|
| 437 |
+
outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
|
| 438 |
show_progress="hidden",
|
| 439 |
)
|
| 440 |
|
tests/test_triage_pipeline.py
CHANGED
|
@@ -20,18 +20,20 @@ def test_examples_map_to_expected_labels_with_fallback_backend() -> None:
|
|
| 20 |
engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
|
| 21 |
|
| 22 |
for example in examples:
|
| 23 |
-
result = engine.triage(example.code, example.traceback_text)
|
| 24 |
assert result.issue_label == example.label
|
|
|
|
| 25 |
|
| 26 |
|
| 27 |
def test_syntax_example_exposes_parser_signal() -> None:
|
| 28 |
example = next(item for item in build_examples() if item.label == "syntax")
|
| 29 |
engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
|
| 30 |
|
| 31 |
-
result = engine.triage(example.code, example.traceback_text)
|
| 32 |
|
| 33 |
assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
|
| 34 |
assert result.matched_pattern.task_id == example.task_id
|
|
|
|
| 35 |
|
| 36 |
|
| 37 |
def test_composed_app_preserves_health_route() -> None:
|
|
|
|
| 20 |
engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
|
| 21 |
|
| 22 |
for example in examples:
|
| 23 |
+
result = engine.triage(example.code, example.traceback_text, example.context_window)
|
| 24 |
assert result.issue_label == example.label
|
| 25 |
+
assert 0.0 <= result.reward_score <= 1.0
|
| 26 |
|
| 27 |
|
| 28 |
def test_syntax_example_exposes_parser_signal() -> None:
|
| 29 |
example = next(item for item in build_examples() if item.label == "syntax")
|
| 30 |
engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
|
| 31 |
|
| 32 |
+
result = engine.triage(example.code, example.traceback_text, example.context_window)
|
| 33 |
|
| 34 |
assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
|
| 35 |
assert result.matched_pattern.task_id == example.task_id
|
| 36 |
+
assert result.repair_plan[0].startswith("Step 1 - Syntax checking and bug fixes")
|
| 37 |
|
| 38 |
|
| 39 |
def test_composed_app_preserves_health_route() -> None:
|
triage.py
CHANGED
|
@@ -181,6 +181,43 @@ def _repair_risk(label: IssueLabel, confidence: float, signal_count: int) -> str
|
|
| 181 |
return "high"
|
| 182 |
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
class CodeTriageEngine:
|
| 185 |
"""Combine static signals with PyTorch embeddings to classify code issues."""
|
| 186 |
|
|
@@ -195,6 +232,7 @@ class CodeTriageEngine:
|
|
| 195 |
self.prototypes = list(prototypes or build_prototypes())
|
| 196 |
self.examples = list(examples or build_examples())
|
| 197 |
self._prototype_matrix: torch.Tensor | None = None
|
|
|
|
| 198 |
|
| 199 |
def example_map(self) -> dict[str, TriageExample]:
|
| 200 |
"""Return UI examples keyed by task id."""
|
|
@@ -206,12 +244,25 @@ class CodeTriageEngine:
|
|
| 206 |
snippet = _sanitize_text(code) or "# No code supplied."
|
| 207 |
return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
|
| 208 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 209 |
def _prototype_embeddings(self) -> torch.Tensor:
|
| 210 |
if self._prototype_matrix is None:
|
| 211 |
reference_texts = [prototype.reference_text for prototype in self.prototypes]
|
| 212 |
self._prototype_matrix = self.backend.embed_texts(reference_texts)
|
| 213 |
return self._prototype_matrix
|
| 214 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
|
| 216 |
trace = (traceback_text or "").lower()
|
| 217 |
heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
|
|
@@ -321,31 +372,37 @@ class CodeTriageEngine:
|
|
| 321 |
best_similarity = float((similarities[best_index] + 1.0) / 2.0)
|
| 322 |
return best_prototype, best_similarity, indexed_scores
|
| 323 |
|
| 324 |
-
def _repair_plan(self, label: IssueLabel, matched: TriagePrototype) -> list[str]:
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
"
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 345 |
"""Run the full triage pipeline on code plus optional failure context."""
|
| 346 |
|
| 347 |
started = time.perf_counter()
|
| 348 |
-
document = self.
|
| 349 |
signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
|
| 350 |
|
| 351 |
candidate_embedding = self.backend.embed_texts([document])
|
|
@@ -367,9 +424,14 @@ class CodeTriageEngine:
|
|
| 367 |
top_confidence = confidence_scores[issue_label]
|
| 368 |
|
| 369 |
top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
summary = (
|
| 371 |
f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
|
| 372 |
-
f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}"
|
|
|
|
| 373 |
)
|
| 374 |
suggested_next_action = {
|
| 375 |
"syntax": "Fix the parser error first, then rerun validation before changing behavior.",
|
|
@@ -381,6 +443,10 @@ class CodeTriageEngine:
|
|
| 381 |
issue_label=issue_label,
|
| 382 |
confidence_scores=confidence_scores,
|
| 383 |
repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 384 |
summary=summary,
|
| 385 |
matched_pattern=PrototypeMatch(
|
| 386 |
task_id=matched.task_id,
|
|
@@ -390,7 +456,7 @@ class CodeTriageEngine:
|
|
| 390 |
summary=matched.summary,
|
| 391 |
rationale=top_signal,
|
| 392 |
),
|
| 393 |
-
repair_plan=self._repair_plan(issue_label, matched),
|
| 394 |
suggested_next_action=suggested_next_action,
|
| 395 |
extracted_signals=signals,
|
| 396 |
model_backend=self.backend.backend_name,
|
|
|
|
| 181 |
return "high"
|
| 182 |
|
| 183 |
|
| 184 |
+
def _clamp_unit(value: float) -> float:
|
| 185 |
+
return round(max(0.0, min(1.0, float(value))), 4)
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
def _lint_score(code: str) -> float:
|
| 189 |
+
stripped_lines = [line.rstrip("\n") for line in code.splitlines()]
|
| 190 |
+
if not stripped_lines:
|
| 191 |
+
return 0.2
|
| 192 |
+
|
| 193 |
+
score = 1.0
|
| 194 |
+
if any(len(line) > 88 for line in stripped_lines):
|
| 195 |
+
score -= 0.15
|
| 196 |
+
if any(line.rstrip() != line for line in stripped_lines):
|
| 197 |
+
score -= 0.1
|
| 198 |
+
if any("\t" in line for line in stripped_lines):
|
| 199 |
+
score -= 0.1
|
| 200 |
+
try:
|
| 201 |
+
tree = ast.parse(code)
|
| 202 |
+
functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
|
| 203 |
+
if functions and not ast.get_docstring(functions[0]):
|
| 204 |
+
score -= 0.08
|
| 205 |
+
except SyntaxError:
|
| 206 |
+
score -= 0.45
|
| 207 |
+
return _clamp_unit(score)
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
def _complexity_penalty(code: str) -> float:
|
| 211 |
+
try:
|
| 212 |
+
tree = ast.parse(code)
|
| 213 |
+
except SyntaxError:
|
| 214 |
+
return 0.95
|
| 215 |
+
branch_nodes = sum(isinstance(node, (ast.If, ast.For, ast.While, ast.Try, ast.Match)) for node in ast.walk(tree))
|
| 216 |
+
loop_depth = _loop_depth(code)
|
| 217 |
+
penalty = 0.1 + min(branch_nodes, 8) * 0.07 + min(loop_depth, 4) * 0.12
|
| 218 |
+
return _clamp_unit(penalty)
|
| 219 |
+
|
| 220 |
+
|
| 221 |
class CodeTriageEngine:
|
| 222 |
"""Combine static signals with PyTorch embeddings to classify code issues."""
|
| 223 |
|
|
|
|
| 232 |
self.prototypes = list(prototypes or build_prototypes())
|
| 233 |
self.examples = list(examples or build_examples())
|
| 234 |
self._prototype_matrix: torch.Tensor | None = None
|
| 235 |
+
self._reference_code_matrix: torch.Tensor | None = None
|
| 236 |
|
| 237 |
def example_map(self) -> dict[str, TriageExample]:
|
| 238 |
"""Return UI examples keyed by task id."""
|
|
|
|
| 244 |
snippet = _sanitize_text(code) or "# No code supplied."
|
| 245 |
return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
|
| 246 |
|
| 247 |
+
def _build_review_document(self, code: str, traceback_text: str, context_window: str) -> str:
|
| 248 |
+
context = _sanitize_text(context_window) or "No additional context window supplied."
|
| 249 |
+
return (
|
| 250 |
+
f"{self._build_document(code, traceback_text)}\n"
|
| 251 |
+
f"Context window:\n{context}\n"
|
| 252 |
+
)
|
| 253 |
+
|
| 254 |
def _prototype_embeddings(self) -> torch.Tensor:
|
| 255 |
if self._prototype_matrix is None:
|
| 256 |
reference_texts = [prototype.reference_text for prototype in self.prototypes]
|
| 257 |
self._prototype_matrix = self.backend.embed_texts(reference_texts)
|
| 258 |
return self._prototype_matrix
|
| 259 |
|
| 260 |
+
def _reference_code_embeddings(self) -> torch.Tensor:
|
| 261 |
+
if self._reference_code_matrix is None:
|
| 262 |
+
reference_codes = [prototype.reference_code for prototype in self.prototypes]
|
| 263 |
+
self._reference_code_matrix = self.backend.embed_texts(reference_codes)
|
| 264 |
+
return self._reference_code_matrix
|
| 265 |
+
|
| 266 |
def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
|
| 267 |
trace = (traceback_text or "").lower()
|
| 268 |
heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
|
|
|
|
| 372 |
best_similarity = float((similarities[best_index] + 1.0) / 2.0)
|
| 373 |
return best_prototype, best_similarity, indexed_scores
|
| 374 |
|
| 375 |
+
def _repair_plan(self, label: IssueLabel, matched: TriagePrototype, context_window: str) -> list[str]:
|
| 376 |
+
context = _sanitize_text(context_window)
|
| 377 |
+
step_one = {
|
| 378 |
+
"syntax": "Step 1 - Syntax checking and bug fixes: resolve the parser break before touching behavior, then align the function with the expected contract.",
|
| 379 |
+
"logic": "Step 1 - Syntax checking and bug fixes: confirm the code parses cleanly, then patch the failing branch or state update causing the incorrect result.",
|
| 380 |
+
"performance": "Step 1 - Syntax checking and bug fixes: keep the implementation correct first, then isolate the slow section without changing external behavior.",
|
| 381 |
+
}[label]
|
| 382 |
+
step_two = (
|
| 383 |
+
"Step 2 - Edge case handling: verify empty input, boundary values, missing fields, and final-state flush behavior "
|
| 384 |
+
f"against the known pattern `{matched.title}`."
|
| 385 |
+
)
|
| 386 |
+
step_three = (
|
| 387 |
+
"Step 3 - Scalability of code: remove repeated full scans, prefer linear-time data structures, "
|
| 388 |
+
"and benchmark the path on a production-like fixture."
|
| 389 |
+
)
|
| 390 |
+
if context:
|
| 391 |
+
step_two = f"{step_two} Context window to preserve: {context}"
|
| 392 |
+
return [step_one, step_two, step_three]
|
| 393 |
+
|
| 394 |
+
def _reference_quality_score(self, code: str, matched: TriagePrototype) -> float:
|
| 395 |
+
candidate = self.backend.embed_texts([_sanitize_text(code) or "# empty"])
|
| 396 |
+
match_index = next(index for index, prototype in enumerate(self.prototypes) if prototype.task_id == matched.task_id)
|
| 397 |
+
reference = self._reference_code_embeddings()[match_index : match_index + 1]
|
| 398 |
+
score = float(torch.matmul(candidate, reference.T)[0][0].item())
|
| 399 |
+
return _clamp_unit((score + 1.0) / 2.0)
|
| 400 |
+
|
| 401 |
+
def triage(self, code: str, traceback_text: str = "", context_window: str = "") -> TriageResult:
|
| 402 |
"""Run the full triage pipeline on code plus optional failure context."""
|
| 403 |
|
| 404 |
started = time.perf_counter()
|
| 405 |
+
document = self._build_review_document(code, traceback_text, context_window)
|
| 406 |
signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
|
| 407 |
|
| 408 |
candidate_embedding = self.backend.embed_texts([document])
|
|
|
|
| 424 |
top_confidence = confidence_scores[issue_label]
|
| 425 |
|
| 426 |
top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
|
| 427 |
+
ml_quality_score = self._reference_quality_score(code, matched)
|
| 428 |
+
lint_score = _lint_score(code)
|
| 429 |
+
complexity_penalty = _complexity_penalty(code)
|
| 430 |
+
reward_score = _clamp_unit((0.5 * ml_quality_score) + (0.3 * lint_score) - (0.2 * complexity_penalty))
|
| 431 |
summary = (
|
| 432 |
f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
|
| 433 |
+
f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}. "
|
| 434 |
+
f"Predicted quality score is {ml_quality_score:.0%} with an RL-ready reward of {reward_score:.0%}."
|
| 435 |
)
|
| 436 |
suggested_next_action = {
|
| 437 |
"syntax": "Fix the parser error first, then rerun validation before changing behavior.",
|
|
|
|
| 443 |
issue_label=issue_label,
|
| 444 |
confidence_scores=confidence_scores,
|
| 445 |
repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
|
| 446 |
+
ml_quality_score=ml_quality_score,
|
| 447 |
+
lint_score=lint_score,
|
| 448 |
+
complexity_penalty=complexity_penalty,
|
| 449 |
+
reward_score=reward_score,
|
| 450 |
summary=summary,
|
| 451 |
matched_pattern=PrototypeMatch(
|
| 452 |
task_id=matched.task_id,
|
|
|
|
| 456 |
summary=matched.summary,
|
| 457 |
rationale=top_signal,
|
| 458 |
),
|
| 459 |
+
repair_plan=self._repair_plan(issue_label, matched, context_window),
|
| 460 |
suggested_next_action=suggested_next_action,
|
| 461 |
extracted_signals=signals,
|
| 462 |
model_backend=self.backend.backend_name,
|
triage_catalog.py
CHANGED
|
@@ -44,6 +44,21 @@ SUMMARY_BY_TASK_ID: Dict[str, str] = {
|
|
| 44 |
"optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
|
| 45 |
}
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
def _prototype_text(
|
| 49 |
task_id: str,
|
|
@@ -82,6 +97,7 @@ def build_examples() -> List[TriageExample]:
|
|
| 82 |
summary=SUMMARY_BY_TASK_ID[task.task_id],
|
| 83 |
code=task.starter_code,
|
| 84 |
traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
|
|
|
|
| 85 |
task_id=task.task_id,
|
| 86 |
)
|
| 87 |
)
|
|
@@ -111,6 +127,7 @@ def build_prototypes() -> List[TriagePrototype]:
|
|
| 111 |
traceback_text,
|
| 112 |
),
|
| 113 |
starter_code=task.starter_code,
|
|
|
|
| 114 |
traceback_text=traceback_text,
|
| 115 |
)
|
| 116 |
)
|
|
|
|
| 44 |
"optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
|
| 45 |
}
|
| 46 |
|
| 47 |
+
CONTEXT_BY_TASK_ID: Dict[str, str] = {
|
| 48 |
+
"syntax_fix_invoice_totals": (
|
| 49 |
+
"Context window: this helper runs in an end-of-day billing reconciliation job. "
|
| 50 |
+
"Keep the public function signature intact and restore correct totals for mixed integer/string inputs."
|
| 51 |
+
),
|
| 52 |
+
"bug_fix_session_windows": (
|
| 53 |
+
"Context window: this function groups sorted product analytics events into sessions for retention dashboards. "
|
| 54 |
+
"Boundary behavior must stay deterministic because downstream reports depend on it."
|
| 55 |
+
),
|
| 56 |
+
"optimization_rank_active_users": (
|
| 57 |
+
"Context window: this pipeline feeds a nightly export on a small CPU instance. "
|
| 58 |
+
"Maintain identical output ordering while improving scalability on larger event volumes."
|
| 59 |
+
),
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
|
| 63 |
def _prototype_text(
|
| 64 |
task_id: str,
|
|
|
|
| 97 |
summary=SUMMARY_BY_TASK_ID[task.task_id],
|
| 98 |
code=task.starter_code,
|
| 99 |
traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
|
| 100 |
+
context_window=CONTEXT_BY_TASK_ID[task.task_id],
|
| 101 |
task_id=task.task_id,
|
| 102 |
)
|
| 103 |
)
|
|
|
|
| 127 |
traceback_text,
|
| 128 |
),
|
| 129 |
starter_code=task.starter_code,
|
| 130 |
+
reference_code=task.reference_code,
|
| 131 |
traceback_text=traceback_text,
|
| 132 |
)
|
| 133 |
)
|
triage_models.py
CHANGED
|
@@ -41,6 +41,7 @@ class TriageExample(BaseModel):
|
|
| 41 |
summary: str
|
| 42 |
code: str
|
| 43 |
traceback_text: str
|
|
|
|
| 44 |
task_id: str
|
| 45 |
|
| 46 |
|
|
@@ -53,6 +54,7 @@ class TriagePrototype(BaseModel):
|
|
| 53 |
summary: str
|
| 54 |
reference_text: str
|
| 55 |
starter_code: str
|
|
|
|
| 56 |
traceback_text: str
|
| 57 |
|
| 58 |
|
|
@@ -62,6 +64,10 @@ class TriageResult(BaseModel):
|
|
| 62 |
issue_label: IssueLabel
|
| 63 |
confidence_scores: Dict[str, float]
|
| 64 |
repair_risk: RiskLevel
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
summary: str
|
| 66 |
matched_pattern: PrototypeMatch
|
| 67 |
repair_plan: List[str]
|
|
|
|
| 41 |
summary: str
|
| 42 |
code: str
|
| 43 |
traceback_text: str
|
| 44 |
+
context_window: str
|
| 45 |
task_id: str
|
| 46 |
|
| 47 |
|
|
|
|
| 54 |
summary: str
|
| 55 |
reference_text: str
|
| 56 |
starter_code: str
|
| 57 |
+
reference_code: str
|
| 58 |
traceback_text: str
|
| 59 |
|
| 60 |
|
|
|
|
| 64 |
issue_label: IssueLabel
|
| 65 |
confidence_scores: Dict[str, float]
|
| 66 |
repair_risk: RiskLevel
|
| 67 |
+
ml_quality_score: float = Field(..., ge=0.0, le=1.0)
|
| 68 |
+
lint_score: float = Field(..., ge=0.0, le=1.0)
|
| 69 |
+
complexity_penalty: float = Field(..., ge=0.0, le=1.0)
|
| 70 |
+
reward_score: float = Field(..., ge=0.0, le=1.0)
|
| 71 |
summary: str
|
| 72 |
matched_pattern: PrototypeMatch
|
| 73 |
repair_plan: List[str]
|