File size: 9,101 Bytes
9c42a64 d7c32bb 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 02547c8 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa d7a58f3 9c42a64 6c59faa 9c42a64 6c59faa d7c32bb 9c42a64 6c59faa d7c32bb 9c42a64 6c59faa d7c32bb 9c42a64 6c59faa d7c32bb 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 62fa8c0 9c42a64 6c59faa 9c42a64 6c59faa 62fa8c0 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa 9c42a64 6c59faa fcad89f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
library_name: transformers
license: mit
language:
- en
metrics:
- rouge
base_model:
- microsoft/Phi-3-mini-4k-instruct
pipeline_tag: text-generation
---
# Model Card for **Phi3-Lab-Report-Coder (LoRA on Phi-3 Mini 4k Instruct)**
A lightweight LoRA-adapter fine-tune of `microsoft/Phi-3-mini-4k-instruct` for **turning structured lab contexts + observations into executable Python code** that performs the target calculations (e.g., mechanics, fluids, vibrations, basic circuits, titrations). Trained with QLoRA in 4-bit, this model is intended as an **assistive code generator** for STEM lab writeups and teaching demos—not as a certified calculator for safety-critical engineering.
---
## Model Details
### Model Description
- **Developed by:** Barghav777
- **Model type:** Causal decoder LM (instruction-tuned) + **LoRA adapter**
- **Languages:** English
- **License:** MIT
- **Finetuned from:** `microsoft/Phi-3-mini-4k-instruct`
- **Intended input format:** A structured prompt with:
- `### CONTEXT:` (natural-language description of the experiment)
- `### OBSERVATIONS:` (JSON-like dict with units, readings)
- `### CODE:` (the model is trained to generate the Python solution after this tag)
### Model Sources
- **Base model:** `microsoft/Phi-3-mini-4k-instruct`
- **Training data files:** `train.jsonl` (37 items), `eval.jsonl` (6 items)
- **Demo/Colab basis:** Training notebook available at: https://github.com/Barghav777/AI-Lab-Report-Agent
---
## Uses
### Direct Use
- Generate **readable Python code** to compute derived quantities from lab observations (e.g., average \(g\) via pendulum, Coriolis acceleration, Ohm’s law resistances, radius of gyration, Reynolds number).
- Produce calculation pipelines with minimal plotting/printing that are easy to copy-paste and run in a notebook.
### Downstream Use
- Course assistants or lab-prep tools that auto-draft calculation code for **intro undergrad physics/mech/fluids/EE labs**.
- Auto-checkers that compare student code vs. a reference implementation (with appropriate guardrails).
### Out-of-Scope Use
- Any **safety-critical** design decisions (structural, medical, chemical process control).
- High-stakes computation without human verification.
- Domains far outside the training distribution (e.g., NLP preprocessing pipelines, advanced control systems, large-scale simulation frameworks).
---
## Bias, Risks, and Limitations
- **Small dataset (37 train / 6 eval)** → plausible overfitting; brittle generalization to unseen experiment formats.
- **Formula misuse risk:** The model may pick incorrect constants/units or silently use wrong equations.
- **Overconfidence:** Generated code may “look right” while being numerically off or unit-inconsistent.
- **JSON brittleness:** If `OBSERVATIONS` keys/units differ from training patterns, the code may break.
### Recommendations
- Always **review formulas and units**; add assertions/unit conversions in downstream systems.
- Run generated code with **test observations** and compare against hand calculations.
- For deployment, wrap outputs with **explanations and references** to the formulas used.
---
## How to Get Started
**Prompt template used in training**
```text
### CONTEXT:
{context}
### OBSERVATIONS:
{observations}
### CODE:
```
**Load base + LoRA adapter (recommended)**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TextStreamer
from peft import PeftModel
import torch
base_id = "microsoft/Phi-3-mini-4k-instruct"
adapter_id = "YOUR_ADAPTER_REPO_OR_LOCAL_PATH" # e.g., ./phi3-lab-report-coder-final
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False)
tok = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb,
trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
prompt = """### CONTEXT:
Experiment to determine acceleration due to gravity using a simple pendulum...
### OBSERVATIONS:
{'readings': [{'L':0.50,'T':1.42}, {'L':0.60,'T':1.55}], 'unit_L':'m', 'unit_T':'s'}
### CODE:
"""
inputs = tok(prompt, return_tensors="pt").to(model.device)
streamer = TextStreamer(tok, skip_prompt=True, skip_special_tokens=True)
_ = model.generate(**inputs, max_new_tokens=400, temperature=0.2, do_sample=False, streamer=streamer)
```
---
## Training Details
### Data
- **Files:** `train.jsonl` (list of objects), `eval.jsonl` (list of objects)
- **Schema per example:**
- `context` *(str)*: experiment description
- `observations` *(dict)*: units + numeric readings (lists of dicts)
- `code` *(str)*: reference Python solution
- **Topical spread (non-exhaustive):** pendulum \(g\), Ohm’s law, titration, density via displacement, Coriolis accel., gyroscopic effect, Hartnell governor, rotating mass balancing, helical spring vibration, bi-filar suspension, etc.
**Size & basic stats**
- Train: **37** items; Eval: **6** items
- Formatted prompt (context+observations+code) length (train):
- mean ≈ **222** words (≈ **1,739** chars); 95th pct ≈ **311** words
- Reference code length (train):
- mean ≈ **34** lines (min **9**, max **71**)
### Training Procedure (from notebook)
- **Approach:** QLoRA (4-bit) SFT using `trl.SFTTrainer`
- **Quantization:** `bitsandbytes` 4-bit `nf4`, compute dtype `bfloat16`
- **LoRA config:** `r=16`, `alpha=32`, `dropout=0.05`, `bias="none"`, targets = `q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj`
- **Tokenizer:** right padding; `eos_token` as `pad_token`
- **Hyperparameters (TrainingArguments):**
- epochs: **10**
- per-device train batch size: **1**
- gradient_accumulation_steps: **4**
- optimizer: **paged_adamw_32bit**
- learning rate: **2e-4**, weight decay: **1e-3**
- warmup_ratio: **0.03**, scheduler: **constant**
- bf16: **True** (fp16: False), group_by_length: True
- logging_steps: 10, save/eval every 50 steps
- report_to: tensorboard
- **Saving:** `trainer.save_model("./phi3-lab-report-coder-final")` (adapter folder)
### Speeds, Sizes, Times
- **Hardware:** Google Colab **T4 GPU** (per notebook metadata)
- **Adapter artifact:** LoRA weights only (load with the base model).
- **Wall-clock time:** not logged in the notebook.
---
## Evaluation
### Testing Data, Factors & Metrics
- **Eval set:** `eval.jsonl` (**6** items) with same schema.
- **Primary metric (planned):** ROUGE-L / ROUGE-1 against reference `code` (proxy for surface similarity).
- **Recommended additional checks:** unit tests on numeric outputs; pyflakes/ruff for syntax; run-time assertions.
### Results
- No automated score recorded in the notebook.
- **Suggested protocol:**
1) Generate code for each eval item using the same prompt template.
2) Execute safely in a sandbox with provided observations.
3) Compare computed scalars (e.g., average \(g\), \(R\), Reynolds number) to ground truth tolerances.
4) Report pass rate and ROUGE for readability/similarity.
---
## Model Examination (optional)
- Inspect token-by-token attention to `OBSERVATIONS` keys (ablation: shuffle keys to test robustness).
- Add **unit-check helpers** (e.g., `pint`) in prompts to encourage explicit conversions.
---
## Environmental Impact
- **Hardware Type:** NVIDIA T4 (Colab)
- **Precision:** 4-bit QLoRA with `bfloat16` compute
- **Hours used:** Not recorded (dataset is small; expected low)
- **Cloud Provider/Region:** Colab (unspecified)
- **Carbon Emitted:** Not estimated (see [ML CO2 Impact calculator](https://mlco2.github.io/impact#compute))
---
## Technical Specifications
### Architecture & Objective
- **Backbone:** `Phi-3-mini-4k-instruct` (decoder-only causal LM)
- **Objective:** Supervised fine-tuning to continue from `### CODE:` with correct, executable Python.
### Compute Infrastructure
- **Hardware:** Colab GPU (T4) + CPU RAM
- **Software:**
- `transformers`, `trl`, `peft`, `bitsandbytes`, `datasets`, `accelerate`, `torch`
---
## Citation
@article{abdin2024phi3,
title = {Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone},
author = {Abdin, Marah and others},
journal = {arXiv preprint arXiv:2404.14219},
year = {2024},
doi = {10.48550/arXiv.2404.14219},
url = {https://arxiv.org/abs/2404.14219}
}
---
## Glossary
- **QLoRA:** Fine-tuning with low-rank adapters on a quantized base model (saves memory/compute).
- **LoRA (r, α):** Rank and scaling of low-rank update matrices.
---
## More Information
- For better robustness, consider augmenting data with **unit-perturbation** and **noise-in-readings** variants, and add examples across more domains (materials, thermo, optics).
- Add **eval harness** with numeric tolerances and syntax checks.
---
## Model Card Authors
- Barghav777
--- |