ethanker commited on
Commit
72d2c9b
·
verified ·
1 Parent(s): 94a0c59

Update Readme.md card

Browse files
Files changed (1) hide show
  1. README.md +99 -145
README.md CHANGED
@@ -3,207 +3,161 @@ base_model: unsloth/LFM2-350M-unsloth-bnb-4bit
3
  library_name: peft
4
  pipeline_tag: text-generation
5
  tags:
6
- - base_model:adapter:unsloth/LFM2-350M-unsloth-bnb-4bit
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
 
 
 
11
  ---
12
 
13
- # Model Card for Model ID
14
-
15
- <!-- Provide a quick summary of what the model is/does. -->
16
 
 
17
 
 
 
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
-
27
- - **Developed by:** [More Information Needed]
28
- - **Funded by [optional]:** [More Information Needed]
29
- - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
- - **Language(s) (NLP):** [More Information Needed]
32
- - **License:** [More Information Needed]
33
- - **Finetuned from model [optional]:** [More Information Needed]
34
 
35
- ### Model Sources [optional]
 
 
 
 
 
 
 
36
 
37
- <!-- Provide the basic links for the model. -->
38
 
39
- - **Repository:** [More Information Needed]
40
- - **Paper [optional]:** [More Information Needed]
41
- - **Demo [optional]:** [More Information Needed]
42
 
43
  ## Uses
44
 
45
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
-
47
  ### Direct Use
48
 
49
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
-
51
- [More Information Needed]
52
-
53
- ### Downstream Use [optional]
54
 
55
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
 
57
- [More Information Needed]
 
58
 
59
  ### Out-of-Scope Use
60
 
61
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
-
63
- [More Information Needed]
64
 
65
  ## Bias, Risks, and Limitations
66
 
67
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
-
69
- [More Information Needed]
70
 
71
  ### Recommendations
72
 
73
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
-
75
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
-
77
- ## How to Get Started with the Model
78
-
79
- Use the code below to get started with the model.
80
-
81
- [More Information Needed]
82
-
83
- ## Training Details
84
-
85
- ### Training Data
86
-
87
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
-
89
- [More Information Needed]
90
-
91
- ### Training Procedure
92
-
93
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
95
- #### Preprocessing [optional]
96
-
97
- [More Information Needed]
98
-
99
-
100
- #### Training Hyperparameters
101
-
102
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
 
104
- #### Speeds, Sizes, Times [optional]
105
 
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
107
 
108
- [More Information Needed]
 
109
 
110
- ## Evaluation
 
111
 
112
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
113
 
114
- ### Testing Data, Factors & Metrics
 
 
 
 
 
 
115
 
116
- #### Testing Data
 
 
 
117
 
118
- <!-- This should link to a Dataset Card if possible. -->
 
 
 
 
119
 
120
- [More Information Needed]
121
-
122
- #### Factors
123
-
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Metrics
129
-
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
-
132
- [More Information Needed]
133
 
134
- ### Results
135
 
136
- [More Information Needed]
 
 
 
137
 
138
- #### Summary
139
 
 
 
 
 
140
 
 
 
 
 
141
 
142
- ## Model Examination [optional]
143
 
144
- <!-- Relevant interpretability work for the model goes here -->
 
145
 
146
- [More Information Needed]
 
 
147
 
148
  ## Environmental Impact
149
 
150
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
-
152
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
-
154
- - **Hardware Type:** [More Information Needed]
155
- - **Hours used:** [More Information Needed]
156
- - **Cloud Provider:** [More Information Needed]
157
- - **Compute Region:** [More Information Needed]
158
- - **Carbon Emitted:** [More Information Needed]
159
-
160
- ## Technical Specifications [optional]
161
-
162
- ### Model Architecture and Objective
163
-
164
- [More Information Needed]
165
-
166
- ### Compute Infrastructure
167
-
168
- [More Information Needed]
169
-
170
- #### Hardware
171
-
172
- [More Information Needed]
173
-
174
- #### Software
175
-
176
- [More Information Needed]
177
-
178
- ## Citation [optional]
179
-
180
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
-
182
- **BibTeX:**
183
-
184
- [More Information Needed]
185
-
186
- **APA:**
187
-
188
- [More Information Needed]
189
 
190
- ## Glossary [optional]
191
 
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
193
 
194
- [More Information Needed]
195
 
196
- ## More Information [optional]
197
 
198
- [More Information Needed]
199
 
200
- ## Model Card Authors [optional]
201
 
202
- [More Information Needed]
203
 
204
- ## Model Card Contact
205
 
206
- [More Information Needed]
207
  ### Framework versions
208
 
209
- - PEFT 0.17.1
 
 
 
3
  library_name: peft
4
  pipeline_tag: text-generation
5
  tags:
6
+ - "base_model:adapter:unsloth/LFM2-350M-unsloth-bnb-4bit"
7
+ - lora
8
+ - qlora
9
+ - sft
10
+ - transformers
11
+ - trl
12
+ - conventional-commits
13
+ - code
14
  ---
15
 
 
 
 
16
 
17
+ # lfm2_350m_commit_diff_summarizer (LoRA)
18
 
19
+ A lightweight **helper model** that turns Git diffs into **Conventional Commit–style** messages.
20
+ It outputs **strict JSON** with a short `title` (≤ 65 chars) and up to 3 `bullets`, so your CLI/agents can parse it deterministically.
21
 
22
  ## Model Details
23
 
24
  ### Model Description
25
 
26
+ * **Purpose:** Summarize `git diff` patches into concise, Conventional Commit–compliant titles with optional bullets.
27
+ * **I/O format:**
 
 
 
 
 
 
 
 
 
28
 
29
+ * **Input:** prompt containing the diff (plain text).
30
+ * **Output:** JSON object: `{"title": "...", "bullets": ["...", "..."]}`.
31
+ * **Developed by:** Ethan (HF: `ethanke`)
32
+ * **Shared by:** Ethan (HF: `ethanke`)
33
+ * **Model type:** LoRA adapter for causal LM (text generation)
34
+ * **Language(s):** English (commit message conventions)
35
+ * **License:** Inherits base model’s license; dataset has **non-commercial** terms (see **Training Data**). Review before production/commercial use.
36
+ * **Finetuned from:** `unsloth/LFM2-350M-unsloth-bnb-4bit` (4-bit quantized base, trained with QLoRA)
37
 
38
+ ### Model Sources
39
 
40
+ * **Repository:** This model card + adapter on the Hub under `ethanke/lfm2_350m_commit_diff_summarizer`
 
 
41
 
42
  ## Uses
43
 
 
 
44
  ### Direct Use
45
 
46
+ * Convert patch diffs into Conventional Commit messages for PR titles, commits, and changelogs.
47
+ * Provide human-readable summaries in agent UIs with guaranteed JSON structure.
 
 
 
48
 
49
+ ### Downstream Use
50
 
51
+ * Plug into CI to auto-suggest commit titles after tests pass.
52
+ * Use as a **helper** in a larger agent system (router/planner stays in a bigger model).
53
 
54
  ### Out-of-Scope Use
55
 
56
+ * General code generation or deep refactoring explanations.
57
+ * Non-English commit conventions.
58
+ * Knowledge-intensive narrative summaries.
59
 
60
  ## Bias, Risks, and Limitations
61
 
62
+ * Trained on public commits filtered to Conventional Commit titles; may **prefer certain styles/projects**.
63
+ * Long diffs are truncated to `max_length`; summarization may miss edge changes.
64
+ * Dataset license may restrict **commercial** usage; verify for your case.
65
 
66
  ### Recommendations
67
 
68
+ * Enforce JSON validation; if invalid, retry with a JSON-repair prompt.
69
+ * Keep a regex gate for Conventional Commit titles in your pipeline.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
+ ## How to Get Started
72
 
73
+ ```python
74
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
75
+ from peft import PeftModel
76
+ import torch, json
77
 
78
+ BASE = "unsloth/LFM2-350M-unsloth-bnb-4bit"
79
+ ADAPTER = "ethanke/lfm2_350m_commit_diff_summarizer" # replace with your repo id
80
 
81
+ bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
82
+ bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16)
83
 
84
+ tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
85
+ mdl = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
86
+ mdl = PeftModel.from_pretrained(mdl, ADAPTER)
87
 
88
+ diff = "...your git diff text..."
89
+ prompt = (
90
+ "You are a commit message summarizer.\n"
91
+ "Return a concise JSON object with fields 'title' (<=65 chars) and 'bullets' (0-3 items).\n"
92
+ "Follow the Conventional Commit style for the title.\n\n"
93
+ "### DIFF\n" + diff + "\n\n### OUTPUT JSON\n"
94
+ )
95
 
96
+ inputs = tok(prompt, return_tensors="pt").to(mdl.device)
97
+ with torch.no_grad():
98
+ out = mdl.generate(**inputs, max_new_tokens=200, do_sample=False)
99
+ text = tok.decode(out[0], skip_special_tokens=True)
100
 
101
+ # naive JSON extraction
102
+ js = text[text.rfind("{"): text.rfind("}")+1]
103
+ obj = json.loads(js)
104
+ print(obj)
105
+ ```
106
 
107
+ ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
+ ### Training Data
110
 
111
+ * **Dataset:** `Maxscha/commitbench` (diff → commit message).
112
+ * **Filtering:** kept only samples whose **first non-empty line** of the message matches Conventional Commits:
113
+ `^(feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert)(\([^)]+\))?(!)?:\s.+$`
114
+ * **Note:** The dataset card indicates non-commercial licensing. Confirm before commercial deployment.
115
 
116
+ ### Training Procedure
117
 
118
+ * **Method:** Supervised fine-tuning (SFT) with TRL `SFTTrainer` + **QLoRA** (PEFT).
119
+ * **Prompting:** Instruction + `### DIFF` + `### OUTPUT JSON` target (title/bullets).
120
+ * **Precision:** fp16 compute on 4-bit base.
121
+ * **Hyperparameters (v0.1):**
122
 
123
+ * `max_length=2048`, `per_device_train_batch_size=2`, `grad_accum=4`
124
+ * `lr=2e-4`, `scheduler=cosine`, `warmup_ratio=0.03`
125
+ * `epochs=1` over capped subset
126
+ * LoRA: `r=16`, `alpha=32`, `dropout=0.05`, targets: q/k/v/o + MLP proj
127
 
128
+ ### Evaluation
129
 
130
+ * **Validation:** filtered split from CommitBench.
131
+ * **Metrics (example run):**
132
 
133
+ * `eval_loss ≈ 1.18` → perplexity ≈ 3.26
134
+ * `eval_mean_token_accuracy ≈ 0.77`
135
+ * Suggested task metrics: JSON validity rate, CC-title compliance, title length ≤ 65 chars, bullets ≤ 3.
136
 
137
  ## Environmental Impact
138
 
139
+ * **Hardware:** NVIDIA GTX 3060 12 GB (local)
140
+ * **Hours used:** ~1–2 h (prototype)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
 
142
+ ## Technical Specifications
143
 
144
+ * **Architecture:** LFM2-350M (decoder-only) + LoRA adapter
145
+ * **Libraries:** `transformers`, `trl`, `peft`, `bitsandbytes`, `datasets`, `unsloth`
146
 
147
+ ## Citation
148
 
149
+ If you use this model, please cite the base model and dataset authors according to their cards.
150
 
151
+ ## Model Card Authors
152
 
153
+ * Ethan (`ethanke`) and contributors
154
 
155
+ ## Contact
156
 
157
+ * Open an issue on the Hub repo or message `ethanke` on Hugging Face.
158
 
 
159
  ### Framework versions
160
 
161
+ * PEFT 0.17.1
162
+ * TRL (SFTTrainer)
163
+ * Transformers (recent version)