Update README.md
Browse files
README.md
CHANGED
|
@@ -1,199 +1,142 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
# Model Card for
|
| 7 |
-
|
| 8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 9 |
-
|
| 10 |
-
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
|
| 14 |
### Model Description
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 19 |
|
| 20 |
-
-
|
| 21 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 22 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 23 |
-
- **Model type:** [More Information Needed]
|
| 24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
-
- **License:** [More Information Needed]
|
| 26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
- **Repository:** [
|
| 33 |
-
- **Paper [optional]:** [More Information Needed]
|
| 34 |
-
- **Demo [optional]:** [More Information Needed]
|
| 35 |
|
| 36 |
## Uses
|
| 37 |
|
| 38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 39 |
-
|
| 40 |
### Direct Use
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
[More Information Needed]
|
| 45 |
-
|
| 46 |
-
### Downstream Use [optional]
|
| 47 |
-
|
| 48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
| 51 |
|
| 52 |
### Out-of-Scope Use
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
[More Information Needed]
|
| 57 |
|
| 58 |
## Bias, Risks, and Limitations
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
[More Information Needed]
|
| 63 |
-
|
| 64 |
-
### Recommendations
|
| 65 |
-
|
| 66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
## How to Get Started with the Model
|
| 71 |
|
| 72 |
-
Use the code below to get started with the model.
|
| 73 |
-
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
## Training Details
|
| 77 |
|
| 78 |
### Training Data
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
|
| 84 |
### Training Procedure
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
#### Preprocessing [optional]
|
| 89 |
-
|
| 90 |
-
[More Information Needed]
|
| 91 |
-
|
| 92 |
|
| 93 |
#### Training Hyperparameters
|
| 94 |
|
| 95 |
-
- **
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 106 |
-
|
| 107 |
-
### Testing Data, Factors & Metrics
|
| 108 |
-
|
| 109 |
-
#### Testing Data
|
| 110 |
-
|
| 111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
| 112 |
-
|
| 113 |
-
[More Information Needed]
|
| 114 |
-
|
| 115 |
-
#### Factors
|
| 116 |
-
|
| 117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 118 |
-
|
| 119 |
-
[More Information Needed]
|
| 120 |
-
|
| 121 |
-
#### Metrics
|
| 122 |
-
|
| 123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 124 |
-
|
| 125 |
-
[More Information Needed]
|
| 126 |
-
|
| 127 |
-
### Results
|
| 128 |
-
|
| 129 |
-
[More Information Needed]
|
| 130 |
-
|
| 131 |
-
#### Summary
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
## Model Examination [optional]
|
| 136 |
-
|
| 137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
-
|
| 139 |
-
[More Information Needed]
|
| 140 |
-
|
| 141 |
-
## Environmental Impact
|
| 142 |
-
|
| 143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 144 |
-
|
| 145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
-
|
| 147 |
-
- **Hardware Type:** [More Information Needed]
|
| 148 |
-
- **Hours used:** [More Information Needed]
|
| 149 |
-
- **Cloud Provider:** [More Information Needed]
|
| 150 |
-
- **Compute Region:** [More Information Needed]
|
| 151 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
-
|
| 153 |
-
## Technical Specifications [optional]
|
| 154 |
-
|
| 155 |
-
### Model Architecture and Objective
|
| 156 |
-
|
| 157 |
-
[More Information Needed]
|
| 158 |
|
| 159 |
### Compute Infrastructure
|
| 160 |
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
#### Hardware
|
| 164 |
-
|
| 165 |
-
[More Information Needed]
|
| 166 |
-
|
| 167 |
-
#### Software
|
| 168 |
-
|
| 169 |
-
[More Information Needed]
|
| 170 |
-
|
| 171 |
-
## Citation [optional]
|
| 172 |
-
|
| 173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 174 |
-
|
| 175 |
-
**BibTeX:**
|
| 176 |
-
|
| 177 |
-
[More Information Needed]
|
| 178 |
-
|
| 179 |
-
**APA:**
|
| 180 |
-
|
| 181 |
-
[More Information Needed]
|
| 182 |
-
|
| 183 |
-
## Glossary [optional]
|
| 184 |
-
|
| 185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
-
|
| 187 |
-
[More Information Needed]
|
| 188 |
-
|
| 189 |
-
## More Information [optional]
|
| 190 |
-
|
| 191 |
-
[More Information Needed]
|
| 192 |
-
|
| 193 |
-
## Model Card Authors [optional]
|
| 194 |
-
|
| 195 |
-
[More Information Needed]
|
| 196 |
|
| 197 |
-
##
|
| 198 |
|
| 199 |
-
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- phi-2
|
| 5 |
+
- code-generation
|
| 6 |
+
- math
|
| 7 |
+
- reasoning
|
| 8 |
+
- gsm8k
|
| 9 |
+
- mbpp
|
| 10 |
+
- finetuned
|
| 11 |
+
datasets:
|
| 12 |
+
- google-research-datasets/mbpp
|
| 13 |
+
- gsm8k
|
| 14 |
+
- meta-math/MATH
|
| 15 |
+
language:
|
| 16 |
+
- en
|
| 17 |
+
base_model:
|
| 18 |
+
- microsoft/phi-2
|
| 19 |
+
pipeline_tag: text-generation
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# Model Card for DeryFerd/Qwen-Math-Code-Distill-Phi-2
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Model Details
|
| 25 |
|
| 26 |
### Model Description
|
| 27 |
|
| 28 |
+
**UPDATE:** This model is a fine-tuned, versatile version of **`microsoft/phi-2`**, adapted for both **Python code generation** and **step-by-step mathematical reasoning**. The goal of this project was to distill the capabilities of larger "teacher" models (`Qwen2.5-Coder-7B-Instruct` for coding and `Qwen2.5-Math-7B-Instruct` for math) into the compact and efficient Phi-2 architecture.
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
The model was trained on a combined dataset of Python programming problems (from MBPP) and grade-school math word problems (from GSM8K and MATH). It is designed to generate not just answers, but also the thought process behind them, mimicking the style of its teachers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
- **Developed by:** DeryFerd
|
| 33 |
+
- **Model type:** Causal Language Model
|
| 34 |
+
- **Language(s) (NLP):** English
|
| 35 |
+
- **License:** MIT
|
| 36 |
+
- **Finetuned from model:** `microsoft/phi-2`
|
| 37 |
|
| 38 |
+
### Model Sources
|
| 39 |
|
| 40 |
+
- **Repository:** [https://huggingface.co/DeryFerd/Qwen-Math-Code-Distill-Phi-2](https://huggingface.co/DeryFerd/Qwen-Math-Code-Distill-Phi-2)
|
|
|
|
|
|
|
| 41 |
|
| 42 |
## Uses
|
| 43 |
|
|
|
|
|
|
|
| 44 |
### Direct Use
|
| 45 |
|
| 46 |
+
This model is intended for direct use in generating Python functions from natural language and solving math word problems with step-by-step explanations. It can be used as a coding/math assistant, for educational purposes, or for rapid prototyping.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
**Intended Use:**
|
| 49 |
+
* Generating Python functions from docstrings or natural language instructions.
|
| 50 |
+
* Solving math problems while showing the reasoning process.
|
| 51 |
|
| 52 |
### Out-of-Scope Use
|
| 53 |
|
| 54 |
+
This is a specialized model. It will not perform well on tasks outside of basic Python code and grade-school level math, such as general conversation, translation, or creative writing. It has not been trained or evaluated for safety and may produce incorrect or insecure code, as well as flawed mathematical reasoning.
|
|
|
|
|
|
|
| 55 |
|
| 56 |
## Bias, Risks, and Limitations
|
| 57 |
|
| 58 |
+
This model was trained on the MBPP, GSM8K, and MATH datasets. Its capabilities are limited to these domains. The model may generate code that is syntactically correct but logically flawed, or math solutions that seem logical but contain calculation errors. **Always review and test the generated output before use in production environments.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
+
A notable limitation discovered during development is a potential **low-level GPU memory conflict**. When this model is loaded into the same runtime as a significantly larger and architecturally different model (like Qwen 7B), its fine-tuned capabilities can be silently overridden, causing it to revert to the base model's behavior. It is recommended to run this model in an isolated process.
|
| 61 |
|
| 62 |
## How to Get Started with the Model
|
| 63 |
|
| 64 |
+
Use the code below to get started with the model using the `transformers` library.
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
|
| 68 |
+
|
| 69 |
+
model_id = "DeryFerd/Qwen-Math-Code-Distill-Phi-2"
|
| 70 |
+
|
| 71 |
+
# Load the tokenizer and model
|
| 72 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 73 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 74 |
+
model_id,
|
| 75 |
+
torch_dtype="auto",
|
| 76 |
+
device_map="auto",
|
| 77 |
+
trust_remote_code=True
|
| 78 |
+
)
|
| 79 |
+
|
| 80 |
+
# Create a text-generation pipeline
|
| 81 |
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
| 82 |
+
|
| 83 |
+
# --- Example 1: Coding ---
|
| 84 |
+
code_instruction = "Write a Python function that takes a list of strings and returns a new list with all strings converted to uppercase."
|
| 85 |
+
prompt = f"Instruct: {code_instruction.strip()}\nOutput:"
|
| 86 |
+
|
| 87 |
+
outputs = pipe(
|
| 88 |
+
prompt,
|
| 89 |
+
max_new_tokens=256,
|
| 90 |
+
do_sample=False,
|
| 91 |
+
pad_token_id=tokenizer.eos_token_id
|
| 92 |
+
)
|
| 93 |
+
response = outputs[0]['generated_text'].split("Output:")[1].strip()
|
| 94 |
+
print("--- Coding Example ---")
|
| 95 |
+
print(response)
|
| 96 |
+
|
| 97 |
+
# --- Example 2: Math ---
|
| 98 |
+
math_instruction = "A bakery has 150 cookies. They sell 60 in the morning and 35 in the afternoon. How many cookies are left at the end of the day?"
|
| 99 |
+
prompt = f"Instruct: {math_instruction.strip()}\nOutput:"
|
| 100 |
+
|
| 101 |
+
outputs = pipe(
|
| 102 |
+
prompt,
|
| 103 |
+
max_new_tokens=512,
|
| 104 |
+
do_sample=False,
|
| 105 |
+
pad_token_id=tokenizer.eos_token_id
|
| 106 |
+
)
|
| 107 |
+
response = outputs[0]['generated_text'].split("Output:")[1].strip()
|
| 108 |
+
print("\n--- Math Example ---")
|
| 109 |
+
print(response)
|
| 110 |
|
| 111 |
## Training Details
|
| 112 |
|
| 113 |
### Training Data
|
| 114 |
|
| 115 |
+
The model was fine-tuned on a combined dataset of **3,474 instruction-response pairs**:
|
| 116 |
+
- **2,500 math problems:** A mix of 2,000 samples from the GSM8K dataset and 500 samples from the MATH dataset. Generated using `Qwen2.5-Math-7B-Instruct`.
|
| 117 |
+
- **974 coding problems:** A curated subset of the MBPP dataset. Generated using `Qwen2.5-Coder-7B-Instruct`.
|
| 118 |
|
| 119 |
### Training Procedure
|
| 120 |
|
| 121 |
+
The model was fine-tuned using the LoRA (Low-Rank Adaptation) method for parameter-efficient fine-tuning (PEFT).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
#### Training Hyperparameters
|
| 124 |
|
| 125 |
+
- **Framework:** `trl.SFTTrainer`
|
| 126 |
+
- **LoRA `r`:** 16
|
| 127 |
+
- **LoRA `alpha`:** 32
|
| 128 |
+
- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `dense`
|
| 129 |
+
- **Learning Rate:** 2e-4
|
| 130 |
+
- **LR Scheduler:** Constant
|
| 131 |
+
- **Epochs:** 3
|
| 132 |
+
- **Batch Size:** 1 (with gradient accumulation of 8)
|
| 133 |
+
- **Optimizer:** Paged AdamW 8-bit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
### Compute Infrastructure
|
| 136 |
|
| 137 |
+
- **Hardware Type:** Single NVIDIA T4 GPU
|
| 138 |
+
- **Cloud Provider:** Kaggle Notebooks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
+
## Citation
|
| 141 |
|
| 142 |
+
If you use this model, please consider citing the original Phi-2, MBPP, GSM8K, and MATH papers.
|