Barghav777 commited on
Commit
6c59faa
·
verified ·
1 Parent(s): d7c32bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -128
README.md CHANGED
@@ -10,202 +10,227 @@ base_model:
10
  pipeline_tag: text-generation
11
  ---
12
 
13
- # Model Card for Model ID
14
-
15
- A lightweight LoRA-adapter fine-tune of microsoft/Phi-3-mini-4k-instruct for turning structured lab contexts + observations into executable Python code that performs the target calculations (e.g., mechanics, fluids, vibrations, basic circuits, titrations). Trained with QLoRA in 4-bit, this model is intended as an assistive code generator for STEM lab writeups and teaching demos—not as a certified calculator for safety-critical engineering.
16
 
 
17
 
 
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
26
-
27
- - **Developed by:** [Barghav777]
28
- - **Model type:** Causal decoder LM (instruction-tuned) + LoRA adapter
29
- - **Language(s) (NLP):** English
30
- - **License:** MIT
31
- - **Finetuned from model [optional]:** microsoft/Phi-3-mini-4k-instruct
32
 
33
- ### Model Sources [optional]
34
 
35
- <!-- Provide the basic links for the model. -->
 
 
36
 
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
 
41
  ## Uses
42
 
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
  ### Direct Use
46
-
47
- - Generate readable Python code to compute derived quantities from lab observations (e.g., average g via pendulum, Coriolis acceleration, Ohm’s law resistances, radius of gyration, Reynolds number).
48
-
49
  - Produce calculation pipelines with minimal plotting/printing that are easy to copy-paste and run in a notebook.
50
 
51
- ### Downstream Use [optional]
52
-
53
- - Course assistants or lab-prep tools that auto-draft calculation code for intro undergrad physics/mech/fluids/EE labs.
54
-
55
  - Auto-checkers that compare student code vs. a reference implementation (with appropriate guardrails).
56
 
57
  ### Out-of-Scope Use
58
-
59
- - Any safety-critical design decisions (structural, medical, chemical process control).
60
-
61
  - High-stakes computation without human verification.
62
-
63
  - Domains far outside the training distribution (e.g., NLP preprocessing pipelines, advanced control systems, large-scale simulation frameworks).
64
 
65
- ## Bias, Risks, and Limitations
66
-
67
- - Small dataset (37 train / 6 eval) → plausible overfitting; brittle generalization to unseen experiment formats.
68
-
69
- - Formula misuse risk: The model may pick incorrect constants/units or silently use wrong equations.
70
 
71
- - Overconfidence: Generated code may “look right” while being numerically off or unit-inconsistent.
72
 
73
- - JSON brittleness: If OBSERVATIONS keys/units differ from training patterns, the code may break.
 
 
 
74
 
75
  ### Recommendations
 
 
 
76
 
77
- - Always review formulas and units; add assertions/unit conversions in downstream systems.
78
-
79
- - Run generated code with test observations and compare against hand calculations.
80
 
81
- - For deployment, wrap outputs with explanations and references to the formulas used.
82
- ## How to Get Started with the Model
83
 
84
- Use the code below to get started with the model.
 
 
 
85
 
86
- [More Information Needed]
 
87
 
88
- ## Training Details
 
89
 
90
- ### Training Data
 
 
 
 
91
 
92
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
93
 
94
- [More Information Needed]
 
95
 
96
- ### Training Procedure
 
97
 
98
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
99
 
100
- #### Preprocessing [optional]
 
101
 
102
- [More Information Needed]
 
103
 
 
 
104
 
105
- #### Training Hyperparameters
 
 
 
106
 
107
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
108
 
109
- #### Speeds, Sizes, Times [optional]
110
 
111
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
- [More Information Needed]
114
 
115
  ## Evaluation
116
 
117
- <!-- This section describes the evaluation protocols and provides the results. -->
118
-
119
  ### Testing Data, Factors & Metrics
120
-
121
- #### Testing Data
122
-
123
- <!-- This should link to a Dataset Card if possible. -->
124
-
125
- [More Information Needed]
126
-
127
- #### Factors
128
-
129
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
130
-
131
- [More Information Needed]
132
-
133
- #### Metrics
134
-
135
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
136
-
137
- [More Information Needed]
138
 
139
  ### Results
 
 
 
 
 
 
140
 
141
- [More Information Needed]
142
-
143
- #### Summary
144
-
145
-
146
-
147
- ## Model Examination [optional]
148
 
149
- <!-- Relevant interpretability work for the model goes here -->
 
 
150
 
151
- [More Information Needed]
152
 
153
  ## Environmental Impact
 
 
 
 
 
154
 
155
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
156
-
157
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
158
-
159
- - **Hardware Type:** [More Information Needed]
160
- - **Hours used:** [More Information Needed]
161
- - **Cloud Provider:** [More Information Needed]
162
- - **Compute Region:** [More Information Needed]
163
- - **Carbon Emitted:** [More Information Needed]
164
-
165
- ## Technical Specifications [optional]
166
 
167
- ### Model Architecture and Objective
168
 
169
- [More Information Needed]
 
 
170
 
171
  ### Compute Infrastructure
 
 
 
 
172
 
173
- [More Information Needed]
174
-
175
- #### Hardware
176
-
177
- [More Information Needed]
178
-
179
- #### Software
180
-
181
- [More Information Needed]
182
-
183
- ## Citation [optional]
184
-
185
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
186
-
187
- **BibTeX:**
188
-
189
- [More Information Needed]
190
-
191
- **APA:**
192
-
193
- [More Information Needed]
194
 
195
- ## Glossary [optional]
 
196
 
197
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
198
 
199
- [More Information Needed]
 
 
200
 
201
- ## More Information [optional]
202
 
203
- [More Information Needed]
 
 
204
 
205
- ## Model Card Authors [optional]
206
 
207
- [More Information Needed]
 
208
 
209
  ## Model Card Contact
 
 
 
210
 
211
- [More Information Needed]
 
 
 
 
10
  pipeline_tag: text-generation
11
  ---
12
 
13
+ # Model Card for **Phi3-Lab-Report-Coder (LoRA on Phi-3 Mini 4k Instruct)**
 
 
14
 
15
+ A lightweight LoRA-adapter fine-tune of `microsoft/Phi-3-mini-4k-instruct` for **turning structured lab contexts + observations into executable Python code** that performs the target calculations (e.g., mechanics, fluids, vibrations, basic circuits, titrations). Trained with QLoRA in 4-bit, this model is intended as an **assistive code generator** for STEM lab writeups and teaching demos—not as a certified calculator for safety-critical engineering.
16
 
17
+ ---
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
+ - **Developed by:** You (this repo/model card author)
24
+ - **Model type:** Causal decoder LM (instruction-tuned) + **LoRA adapter**
25
+ - **Languages:** English
26
+ - **License:** MIT
27
+ - **Finetuned from:** `microsoft/Phi-3-mini-4k-instruct`
28
+ - **Intended input format:** A structured prompt with:
29
+ - `### CONTEXT:` (natural-language description of the experiment)
30
+ - `### OBSERVATIONS:` (JSON-like dict with units, readings)
31
+ - `### CODE:` (the model is trained to generate the Python solution after this tag)
32
 
33
+ ### Model Sources
34
 
35
+ - **Base model:** `microsoft/Phi-3-mini-4k-instruct`
36
+ - **Training data files:** `train.jsonl` (37 items), `eval.jsonl` (6 items)
37
+ - **Demo/Colab basis:** Local notebook `Untitled64 (1).ipynb` (Colab, GPU=T4)
38
 
39
+ ---
 
 
40
 
41
  ## Uses
42
 
 
 
43
  ### Direct Use
44
+ - Generate **readable Python code** to compute derived quantities from lab observations (e.g., average \(g\) via pendulum, Coriolis acceleration, Ohm’s law resistances, radius of gyration, Reynolds number).
 
 
45
  - Produce calculation pipelines with minimal plotting/printing that are easy to copy-paste and run in a notebook.
46
 
47
+ ### Downstream Use
48
+ - Course assistants or lab-prep tools that auto-draft calculation code for **intro undergrad physics/mech/fluids/EE labs**.
 
 
49
  - Auto-checkers that compare student code vs. a reference implementation (with appropriate guardrails).
50
 
51
  ### Out-of-Scope Use
52
+ - Any **safety-critical** design decisions (structural, medical, chemical process control).
 
 
53
  - High-stakes computation without human verification.
 
54
  - Domains far outside the training distribution (e.g., NLP preprocessing pipelines, advanced control systems, large-scale simulation frameworks).
55
 
56
+ ---
 
 
 
 
57
 
58
+ ## Bias, Risks, and Limitations
59
 
60
+ - **Small dataset (37 train / 6 eval)** plausible overfitting; brittle generalization to unseen experiment formats.
61
+ - **Formula misuse risk:** The model may pick incorrect constants/units or silently use wrong equations.
62
+ - **Overconfidence:** Generated code may “look right” while being numerically off or unit-inconsistent.
63
+ - **JSON brittleness:** If `OBSERVATIONS` keys/units differ from training patterns, the code may break.
64
 
65
  ### Recommendations
66
+ - Always **review formulas and units**; add assertions/unit conversions in downstream systems.
67
+ - Run generated code with **test observations** and compare against hand calculations.
68
+ - For deployment, wrap outputs with **explanations and references** to the formulas used.
69
 
70
+ ---
 
 
71
 
72
+ ## How to Get Started
 
73
 
74
+ **Prompt template used in training**
75
+ ```text
76
+ ### CONTEXT:
77
+ {context}
78
 
79
+ ### OBSERVATIONS:
80
+ {observations}
81
 
82
+ ### CODE:
83
+ ```
84
 
85
+ **Load base + LoRA adapter (recommended)**
86
+ ```python
87
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TextStreamer
88
+ from peft import PeftModel
89
+ import torch
90
 
91
+ base_id = "microsoft/Phi-3-mini-4k-instruct"
92
+ adapter_id = "YOUR_ADAPTER_REPO_OR_LOCAL_PATH" # e.g., ./phi3-lab-report-coder-final
93
 
94
+ bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
95
+ bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False)
96
 
97
+ tok = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
98
+ tok.pad_token = tok.eos_token
99
 
100
+ base = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb,
101
+ trust_remote_code=True, device_map="auto")
102
+ model = PeftModel.from_pretrained(base, adapter_id)
103
+ model.eval()
104
 
105
+ prompt = """### CONTEXT:
106
+ Experiment to determine acceleration due to gravity using a simple pendulum...
107
 
108
+ ### OBSERVATIONS:
109
+ {'readings': [{'L':0.50,'T':1.42}, {'L':0.60,'T':1.55}], 'unit_L':'m', 'unit_T':'s'}
110
 
111
+ ### CODE:
112
+ """
113
 
114
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
115
+ streamer = TextStreamer(tok, skip_prompt=True, skip_special_tokens=True)
116
+ _ = model.generate(**inputs, max_new_tokens=400, temperature=0.2, do_sample=False, streamer=streamer)
117
+ ```
118
 
119
+ ---
120
 
121
+ ## Training Details
122
 
123
+ ### Data
124
+ - **Files:** `train.jsonl` (list of objects), `eval.jsonl` (list of objects)
125
+ - **Schema per example:**
126
+ - `context` *(str)*: experiment description
127
+ - `observations` *(dict)*: units + numeric readings (lists of dicts)
128
+ - `code` *(str)*: reference Python solution
129
+ - **Topical spread (non-exhaustive):** pendulum \(g\), Ohm’s law, titration, density via displacement, Coriolis accel., gyroscopic effect, Hartnell governor, rotating mass balancing, helical spring vibration, bi-filar suspension, etc.
130
+
131
+ **Size & basic stats**
132
+ - Train: **37** items; Eval: **6** items
133
+ - Formatted prompt (context+observations+code) length (train):
134
+ - mean ≈ **222** words (≈ **1,739** chars); 95th pct ≈ **311** words
135
+ - Reference code length (train):
136
+ - mean ≈ **34** lines (min **9**, max **71**)
137
+
138
+ ### Training Procedure (from notebook)
139
+ - **Approach:** QLoRA (4-bit) SFT using `trl.SFTTrainer`
140
+ - **Quantization:** `bitsandbytes` 4-bit `nf4`, compute dtype `bfloat16`
141
+ - **LoRA config:** `r=16`, `alpha=32`, `dropout=0.05`, `bias="none"`, targets = `q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj`
142
+ - **Tokenizer:** right padding; `eos_token` as `pad_token`
143
+ - **Hyperparameters (TrainingArguments):**
144
+ - epochs: **10**
145
+ - per-device train batch size: **1**
146
+ - gradient_accumulation_steps: **4**
147
+ - optimizer: **paged_adamw_32bit**
148
+ - learning rate: **2e-4**, weight decay: **1e-3**
149
+ - warmup_ratio: **0.03**, scheduler: **constant**
150
+ - bf16: **True** (fp16: False), group_by_length: True
151
+ - logging_steps: 10, save/eval every 50 steps
152
+ - report_to: tensorboard
153
+ - **Saving:** `trainer.save_model("./phi3-lab-report-coder-final")` (adapter folder)
154
+
155
+ ### Speeds, Sizes, Times
156
+ - **Hardware:** Google Colab **T4 GPU** (per notebook metadata)
157
+ - **Adapter artifact:** LoRA weights only (load with the base model).
158
+ - **Wall-clock time:** not logged in the notebook.
159
 
160
+ ---
161
 
162
  ## Evaluation
163
 
 
 
164
  ### Testing Data, Factors & Metrics
165
+ - **Eval set:** `eval.jsonl` (**6** items) with same schema.
166
+ - **Primary metric (planned):** ROUGE-L / ROUGE-1 against reference `code` (proxy for surface similarity).
167
+ - **Recommended additional checks:** unit tests on numeric outputs; pyflakes/ruff for syntax; run-time assertions.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
169
  ### Results
170
+ - No automated score recorded in the notebook.
171
+ - **Suggested protocol:**
172
+ 1) Generate code for each eval item using the same prompt template.
173
+ 2) Execute safely in a sandbox with provided observations.
174
+ 3) Compare computed scalars (e.g., average \(g\), \(R\), Reynolds number) to ground truth tolerances.
175
+ 4) Report pass rate and ROUGE for readability/similarity.
176
 
177
+ ---
 
 
 
 
 
 
178
 
179
+ ## Model Examination (optional)
180
+ - Inspect token-by-token attention to `OBSERVATIONS` keys (ablation: shuffle keys to test robustness).
181
+ - Add **unit-check helpers** (e.g., `pint`) in prompts to encourage explicit conversions.
182
 
183
+ ---
184
 
185
  ## Environmental Impact
186
+ - **Hardware Type:** NVIDIA T4 (Colab)
187
+ - **Precision:** 4-bit QLoRA with `bfloat16` compute
188
+ - **Hours used:** Not recorded (dataset is small; expected low)
189
+ - **Cloud Provider/Region:** Colab (unspecified)
190
+ - **Carbon Emitted:** Not estimated (see [ML CO2 Impact calculator](https://mlco2.github.io/impact#compute))
191
 
192
+ ---
 
 
 
 
 
 
 
 
 
 
193
 
194
+ ## Technical Specifications
195
 
196
+ ### Architecture & Objective
197
+ - **Backbone:** `Phi-3-mini-4k-instruct` (decoder-only causal LM)
198
+ - **Objective:** Supervised fine-tuning to continue from `### CODE:` with correct, executable Python.
199
 
200
  ### Compute Infrastructure
201
+ - **Hardware:** Colab GPU (T4) + CPU RAM
202
+ - **Software:**
203
+ - `transformers`, `trl`, `peft`, `bitsandbytes`, `datasets`, `accelerate`, `torch`
204
+ - Notebook: `Untitled64 (1).ipynb`
205
 
206
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207
 
208
+ ## Citation
209
+ If you write about this model, please cite the base model and your repository. (Add BibTeX here if/when available.)
210
 
211
+ ---
212
 
213
+ ## Glossary
214
+ - **QLoRA:** Fine-tuning with low-rank adapters on a quantized base model (saves memory/compute).
215
+ - **LoRA (r, α):** Rank and scaling of low-rank update matrices.
216
 
217
+ ---
218
 
219
+ ## More Information
220
+ - For better robustness, consider augmenting data with **unit-perturbation** and **noise-in-readings** variants, and add examples across more domains (materials, thermo, optics).
221
+ - Add **eval harness** with numeric tolerances and syntax checks.
222
 
223
+ ---
224
 
225
+ ## Model Card Authors
226
+ - You (model author/maintainer)
227
 
228
  ## Model Card Contact
229
+ - Add your preferred contact or HF discussion link.
230
+
231
+ ---
232
 
233
+ ### Notes on Assumptions & Gaps (for rigor)
234
+ - **Assumption:** The adapter folder `./phi3-lab-report-coder-final` contains PEFT weights (not a merged full model). The notebook’s `save_model` call supports that; loading snippet reflects this.
235
+ - **Known gap:** No recorded objective metrics; this card avoids fabricating results. Add a small script to run eval and compute numeric accuracy + ROUGE.
236
+ - **Risk callout:** Dataset size is modest (37/6); without stronger regularization and more variety, generalization is limited.