band2001 commited on
Commit
4fabf16
1 Parent(s): 8b76381

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +200 -0
README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - band2001/stolaf-angora
5
+ ---
6
+
7
+ # Model Card for Angora-1600
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This model has been created to help computer science students at St. Olaf College (Northfield, MN) answer questions about fundamental CS principles as well as questions about the specific technical stacks and procedures St. Olaf Computer Science uses.
12
+
13
+ ## Angora-1600 Details
14
+
15
+ This model is built off of [Google's Gemma 7b-it](https://huggingface.co/google/gemma-7b-it) model. It was fine tuned with a dataset created with the purpose of addressing St. Olaf specific Computer Science questions. Some of these questions reference the specific instance of git the institution uses or address steps to declare the computer science major. This model was fine-tuned using MLX on an Apple M3 Max Chip. This model was trained for 1600 iterations using LoRA as the method for finetuning.
16
+
17
+ - **Developed by:** Ben Anderson & Keegan Murray
18
+ - **Funded by:** St. Olaf College MSCS Department
19
+ - **Model type:** Generative
20
+ - **License:** MIT
21
+ - **Finetuned from model:** [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)
22
+
23
+ <!-- Provide the basic links for the model. -->
24
+
25
+ - **Repository:** See the GitHub repository [here](https://github.com/band2001/stolaf-angora)
26
+ - **Paper:** Coming soon...
27
+ - **Demo:** A video demo is available [here](https://drive.google.com/file/d/1iwThVj88FTgLNANZdv2NineRcBXAqtZp/view?usp=sharing).
28
+
29
+ ## Uses
30
+
31
+ This is intended to be used by Computer Science students at St. Olaf College. While it can be used broadly for general computer science questions, it has been finetuned to answer questions specific to the St. Olaf Computer Science program.
32
+
33
+ ## How to Get Started with the Model
34
+
35
+ Use the code below to get started with the model.
36
+
37
+ ### Direct Use With Transformers Library
38
+
39
+
40
+ #### Use a pipeline as a high-level helper
41
+ ```python
42
+ from transformers import pipeline
43
+
44
+ pipe = pipeline("text-generation", model="band2001/stolaf-angora-1600")
45
+ ```
46
+
47
+ #### Load model directly
48
+ ```python
49
+ from transformers import AutoTokenizer, AutoModelForCausalLM
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained("band2001/stolaf-angora-1600")
52
+ model = AutoModelForCausalLM.from_pretrained("band2001/stolaf-angora-1600", device_map="auto")
53
+
54
+ input_ids = tokenizer("YOUR PROMPT HERE", return_tensors="pt").to("YOUR DEVICE IF USING GPU ACCELERATION")
55
+
56
+ outputs = model.generate(**input_ids, max_new_tokens=256)
57
+ decoded_output = tokenizer.decode(outputs[0])
58
+ ```
59
+
60
+ ### Direct Use With MLX Library
61
+
62
+ Note MLX can only be used with Apple Silicon Macs. It is also recommended to use one of their Max series chips or higher.
63
+
64
+ ```python
65
+ from mlx_lm import load, generate
66
+
67
+ def format_prompt(prompt, system_prompt = "YOUR SYSTEM PROMPT"):
68
+
69
+ return """<bos><start_of_turn>user
70
+ ## Instructions
71
+ {}
72
+ ## User
73
+ {}<end_of_turn>
74
+ <start_of_turn>model
75
+ """.format(system_prompt, prompt)
76
+
77
+ model, tokenizer = load("band2001/stolaf-angora-1600")
78
+
79
+ prompt = format_prompt("YOUR PROMPT HERE")
80
+
81
+ decoded_output = generate(
82
+ model,
83
+ tokenizer,
84
+ prompt=prompt,
85
+ verbose=True,
86
+ temp=0.0,
87
+ max_tokens=256,
88
+ )
89
+ ```
90
+
91
+ ### Out-of-Scope Use
92
+
93
+ Outside of using this model to ask questions about computer science topics (generally and specific to St. Olaf College), this model should not be used for other inference. Asking questions about other topics will likely yield answers; however, they have not been fine-tuned and will most likely contain errors and/or could potentially include offensive content.
94
+
95
+ ## Bias, Risks, and Limitations
96
+
97
+ As we created the fine-tuning dataset from scratch, it is relatively limited compared to the overall size of the model. Our dataset has about 2000 observations, while the model has roughly 8.5B parameters. So while our dataset had a noticeable effect on the tuning of this model, it still will fall back on other knowledge occasionally and provide partially incorrect answers for St. Olaf specific questions.
98
+
99
+ Also note the limitations present in the [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) model and assume they are present in this model as well.
100
+
101
+ ## Training Details
102
+
103
+ ### Training Data
104
+
105
+ The training data can be found in the St. Olaf Angora Dataset ([band2001/stolaf-angora](https://huggingface.co/datasets/band2001/stolaf-angora)).
106
+
107
+
108
+ ### Training Procedure
109
+
110
+ To train the model, the data needs to be in the following format. Note the data in [band2001/stolaf-angora](https://huggingface.co/datasets/band2001/stolaf-angora) already is.
111
+
112
+ ```
113
+ <bos><start_of_turn>user
114
+ ## Instructions
115
+ system prompt goes here
116
+
117
+ ## User
118
+ prompt/query goes here<end_of_turn>
119
+
120
+ <start_of_turn>model
121
+ model response here (put a response here for tuning purposes)<end_of_turn><eos>
122
+ ```
123
+
124
+ Once the data is in the correct format, QLoRA is recommended. The model can be fine-tuned either using mlx-lm and mps (to tune on an Apple Silicon machine) or a bitsandbytes configuration and cuda (to tune on a machine with Nvidia GPUs).
125
+
126
+ #### Preprocessing
127
+
128
+ To preprocess your data to be in the correct format outlined above, you can use the following helper function:
129
+
130
+ ```python
131
+ def generate_prompt(entry, system_prompt = SYSTEM_PROMPT):
132
+ '''
133
+ This function formats a question/answer pair to gemma's chat template.
134
+
135
+ :param: entry - a dictionary with an instruction and a response
136
+ :param: system_prompt: the system prompt to be used
137
+
138
+ :return: the formated string for gemma's chat template
139
+ '''
140
+
141
+ return """<bos><start_of_turn>user
142
+ ## Instructions
143
+ {}
144
+ ## User
145
+ {}<end_of_turn>
146
+ <start_of_turn>model
147
+ {}<end_of_turn><eos>""".format(system_prompt, entry["instruction"], entry["response"])
148
+ ```
149
+
150
+ When trying to use inference with this model, you can format the user's query using this helper function:
151
+
152
+ ```python
153
+ def format_prompt(prompt, system_prompt = SYSTEM_PROMPT):
154
+ '''
155
+ This function formats a question to gemma's chat template.
156
+
157
+ :param: prompt - a string with the user's query
158
+ :param: system_prompt: the system prompt to be used
159
+
160
+ :return: the formated string for gemma's chat template
161
+ '''
162
+
163
+ return """<bos><start_of_turn>user
164
+ ## Instructions
165
+ {}
166
+ ## User
167
+ {}<end_of_turn>
168
+ <start_of_turn>model
169
+ """.format(system_prompt, prompt)
170
+ ```
171
+
172
+ #### Training Process
173
+
174
+ The MLX LoRA fine-tuning approach was used. You can learn more about [MLX LoRA here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md). The Gemma-7b-it was loaded in without any conversion. The default `batch_size = 16` was chosen and to reach a 1600 iteration fine-tuned model the model was tuned with 800 iterations two times. Once the fine-tuned weights were created, the model was fused using MLX's fuse functionality. You can learn more about [fusing with MLX here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md#Fuse-and-Upload). One important change made when fusing with MLX was to change some of the MLX package code to include `"format":"pt"` in the metadata so this model can be used with the transformers library. To do that, the following was done: you can tweak the library code like below in <path_to_your_site-packages>/mlx_lm/utils.py by replacing `mx.save_safetensors(str(shard_path), shard, metadata={"format":"mlx"})` with `mx.save_safetensors(str(shard_path), shard, metadata={"format":"pt"})` to output fused weights with the metadata attribute. Special thanks to [Alexweberk's guide on GitHub](https://gist.github.com/alexweberk/635431b5c5773efd6d1755801020429f) to help solve this issue. Finally, the fused model was uploaded to this HuggingFace repo!
175
+
176
+ If you look at the GitHub repo for this project, mlx_lora.sh includes the command used for the LoRA fine-tuning, mlx_fuse.sh includes the command for the model fusing, and mlx_upload.sh includes the upload command. There is additionally an optional mlx_convert.sh for converting the Google Gemma 7b-it model before fine-tuning if desired.
177
+
178
+ ## Evaluation
179
+
180
+ Testing loss and perplexity were the two metrics used to evaluate the Angora models. A summary of the results for all the different iteration models is included below.
181
+
182
+ ### Results
183
+
184
+ | Number of iterations | Testing Loss | Perplexity |
185
+ |:----------|:----------|:---------|
186
+ |800 | 0.569 | 1.766 |
187
+ | 1600 | 0.302 | 1.352 |
188
+ | 2400 | 0.225 | 1.252 |
189
+ | 3200 | 0.185 | 1.203 |
190
+ | 4000 | 0.170 | 1.185 |
191
+
192
+ ### Testing Data
193
+
194
+ The testing data is available [here](https://huggingface.co/datasets/band2001/stolaf-angora/viewer/default/test).
195
+
196
+ ## Model Card Contact
197
+
198
+ Ben Anderson - [ander6@stolaf.edu](mailto:ander6@stolaf.edu)
199
+
200
+ Keegan Murray - [murray7@stolaf.edu](mailto:murray7@stolaf.edu)