aidencary
/

codepulse-codebert

 - en
 base_model:
 - microsoft/codebert-base
+pipeline_tag: text-classification
+tags:
+- code-quality
+- bug-detection
+- codebert
+- python
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+# codepulse-codebert
+Fine-tuned binary classifier on top of `microsoft/codebert-base` that
+scores code snippets by P(buggy). Used in the CodePulse analysis engine
+as a confidence validator: it filters GPT-predicted bugs by checking
+whether the flagged line is statistically likely to be buggy, reducing
+false positives before they reach the end user.
+## Model Details
+### Model Description
+CodePulse-CodeBERT is a binary sequence classifier fine-tuned from
+`microsoft/codebert-base`. Given a short code snippet (typically one bug
+line plus optional surrounding context), the model outputs a probability
+that the snippet contains a bug. Predictions below a configurable
+threshold are marked as low-confidence and excluded from the final
+quality score.
+-   **Developed by:** Aiden Cary, Keller Willhite, Zachery Atchley
+-   **Model type:** Transformer-based binary sequence classifier
+    (CodeBERT fine-tune)
+-   **Language(s) (NLP):** Code (Python primary)
+-   **License:** MIT
+-   **Finetuned from model:**
+    [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
+### Model Sources
+-   **Repository:** https://github.com/aidencary/CodePulse
+## Uses
+### Direct Use
+Classify short code snippets as buggy or not buggy:
+``` python
+from transformers import pipeline
+clf = pipeline("text-classification", model="aidencary/codepulse-codebert")
+result = clf("return user_list[index]")
+# [{'label': 'buggy', 'score': 0.87}]
+```
+### Downstream Use
+Integrated into the CodePulse backend
+(`app/services/codebert_validator.py`) as a post-processing layer over
+GPT-generated bug predictions. Each predicted bug line is extracted,
+comment-stripped, and scored. Bugs whose P(buggy) falls below the
+configured threshold are flagged and excluded from the penalty applied
+to the code quality score.
+### Out-of-Scope Use
+-   Full-file classification --- model expects single-line or
+    short-window snippets (≤512 tokens). Long inputs are truncated.
+-   Languages other than Python --- training data was Python-focused;
+    results on other languages are unreliable.
+-   Security vulnerability detection --- trained for general bug
+    patterns, not security-specific flaws (SQLi, XSS, etc.).
+-   Production safety gate without human review --- false negative rate
+    is non-zero.
+## Bias, Risks, and Limitations
+-   Training data skews toward certain bug patterns; rare bug types will
+    have lower recall.
+-   Comment stripping is applied at inference time (inline `# ...`
+    comments are removed before scoring) to prevent label leakage from
+    annotated datasets. Code with semantically meaningful comments may
+    lose signal.
+-   Confidence contrast remapping is applied in the CodePulse pipeline
+    --- raw model probabilities are spread apart via a sigmoid transform
+    before thresholding. Direct use of the model outside that pipeline
+    will see unmodified softmax probabilities.
+## Recommendations
+Use P(buggy) as a soft signal, not a hard gate. Combine with static
+analysis or human review for critical codepaths.
+## How to Get Started with the Model
+``` python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+import torch.nn.functional as F
+tokenizer = AutoTokenizer.from_pretrained("aidencary/codepulse-codebert")
+model = AutoModelForSequenceClassification.from_pretrained("aidencary/codepulse-codebert")
+model.eval()
+snippet = "items[i] = value"
+inputs = tokenizer(snippet, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    logits = model(**inputs).logits
+p_buggy = float(F.softmax(logits, dim=-1)[0][model.config.label2id["buggy"]])
+print(f"P(buggy): {p_buggy:.3f}")
+```
+## Training Details
+### Training Data
+Fine-tuned on labeled code snippets where each sample is a short code
+line or block annotated as buggy or clean. Training data sourced from
+public bug datasets and synthetic bug injection into clean Python code.
+### Training Procedure
+#### Preprocessing
+-   Inline `#` comments stripped to prevent label leakage
+-   Common leading indentation removed (dedented to column 0)
+-   Tokenized with microsoft/codebert-base tokenizer, max length 512
+#### Training Hyperparameters
+-   Training regime: fp32
+-   Base model: microsoft/codebert-base
+-   Task head: AutoModelForSequenceClassification (2 labels)
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+Held-out split from the same labeled snippet dataset used for training.
+#### Metrics
+-   Accuracy
+-   F1 (macro)
+-   P(buggy) calibration --- model confidence should correlate with
+    actual bug rate
+#### Results
+  Metric       Value
+  ------------ ---------------
+  Accuracy     \[add yours\]
+  F1 (macro)   \[add yours\]
+### Summary
+Model performs well on Python snippets matching training distribution.
+Performance degrades on heavily commented code (comments stripped at
+inference) and on languages outside the training set.
+## Technical Specifications
+### Model Architecture and Objective
+RobertaForSequenceClassification (CodeBERT backbone) with a 2-class
+classification head. Objective: binary cross-entropy, labels = {clean,
+buggy}.
+### Compute Infrastructure
+#### Hardware
+Consumer GPU (training)
+#### Software
+-   transformers
+-   torch
+-   Python 3.11+
+## Model Card Authors
+Aiden Cary, Keller Willhite, Zachery Atchley
+## Model Card Contact
+aiden4786@gmail.com