band2001
/

stolaf-angora-2400

@@ -4,15 +4,15 @@ datasets:
 - band2001/stolaf-angora
 ---
-# Model Card for Angora-4000
 <!-- Provide a quick summary of what the model is/does. -->
 This model has been created to help computer science students at St. Olaf College (Northfield, MN) answer questions about fundamental CS principles as well as questions about the specific technical stacks and procedures St. Olaf Computer Science uses.
-## Angora-4000 Details
-This model is built off of [Google's Gemma 7b-it](https://huggingface.co/google/gemma-7b-it) model. It was fine tuned with a dataset created with the purpose of addressing St. Olaf specific Computer Science questions. Some of these questions reference the specific instance of git the institution uses or address steps to declare the computer science major. This model was fine-tuned using MLX on an Apple M3 Max Chip. This model was trained for 4000 iterations using LoRA as the method for finetuning.
 - **Developed by:** Ben Anderson & Keegan Murray
 - **Funded by:** St. Olaf College MSCS Department
@@ -41,15 +41,15 @@ Use the code below to get started with the model.
 ```python
 from transformers import pipeline
-pipe = pipeline("text-generation", model="band2001/stolaf-angora-4000")
 ```
 #### Load model directly
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("band2001/stolaf-angora-4000")
-model = AutoModelForCausalLM.from_pretrained("band2001/stolaf-angora-4000", device_map="auto")
 input_ids = tokenizer("YOUR PROMPT HERE", return_tensors="pt").to("YOUR DEVICE IF USING GPU ACCELERATION")
@@ -74,7 +74,7 @@ def format_prompt(prompt, system_prompt = "YOUR SYSTEM PROMPT"):
 <start_of_turn>model
 """.format(system_prompt, prompt)
-model, tokenizer = load("band2001/stolaf-angora-4000")
 prompt = format_prompt("YOUR PROMPT HERE")
@@ -171,7 +171,7 @@ def format_prompt(prompt, system_prompt = SYSTEM_PROMPT):
 #### Training Process
-The MLX LoRA fine-tuning approach was used. You can learn more about [MLX LoRA here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md). The Gemma-7b-it was loaded in without any conversion. The default `batch_size = 16` was chosen and to reach a 4000 iteration fine-tuned model the model was tuned with 800 iterations five times. Once the fine-tuned weights were created, the model was fused using MLX's fuse functionality. You can learn more about [fusing with MLX here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md#Fuse-and-Upload). One important change made when fusing with MLX was to change some of the MLX package code to include `"format":"pt"` in the metadata so this model can be used with the transformers library. To do that, the following was done: you can tweak the library code like below in <path_to_your_site-packages>/mlx_lm/utils.py by replacing `mx.save_safetensors(str(shard_path), shard, metadata={"format":"mlx"})` with `mx.save_safetensors(str(shard_path), shard, metadata={"format":"pt"})` to output fused weights with the metadata attribute. Special thanks to [Alexweberk's guide on GitHub](https://gist.github.com/alexweberk/635431b5c5773efd6d1755801020429f) to help solve this issue. Finally, the fused model was uploaded to this HuggingFace repo!
 If you look at the GitHub repo for this project, mlx_lora.sh includes the command used for the LoRA fine-tuning, mlx_fuse.sh includes the command for the model fusing, and mlx_upload.sh includes the upload command. There is additionally an optional mlx_convert.sh for converting the Google Gemma 7b-it model before fine-tuning if desired.

 - band2001/stolaf-angora
 ---
+# Model Card for Angora-2400
 <!-- Provide a quick summary of what the model is/does. -->
 This model has been created to help computer science students at St. Olaf College (Northfield, MN) answer questions about fundamental CS principles as well as questions about the specific technical stacks and procedures St. Olaf Computer Science uses.
+## Angora-2400 Details
+This model is built off of [Google's Gemma 7b-it](https://huggingface.co/google/gemma-7b-it) model. It was fine tuned with a dataset created with the purpose of addressing St. Olaf specific Computer Science questions. Some of these questions reference the specific instance of git the institution uses or address steps to declare the computer science major. This model was fine-tuned using MLX on an Apple M3 Max Chip. This model was trained for 2400 iterations using LoRA as the method for finetuning.
 - **Developed by:** Ben Anderson & Keegan Murray
 - **Funded by:** St. Olaf College MSCS Department
 ```python
 from transformers import pipeline
+pipe = pipeline("text-generation", model="band2001/stolaf-angora-2400")
 ```
 #### Load model directly
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("band2001/stolaf-angora-2400")
+model = AutoModelForCausalLM.from_pretrained("band2001/stolaf-angora-2400", device_map="auto")
 input_ids = tokenizer("YOUR PROMPT HERE", return_tensors="pt").to("YOUR DEVICE IF USING GPU ACCELERATION")
 <start_of_turn>model
 """.format(system_prompt, prompt)
+model, tokenizer = load("band2001/stolaf-angora-2400")
 prompt = format_prompt("YOUR PROMPT HERE")
 #### Training Process
+The MLX LoRA fine-tuning approach was used. You can learn more about [MLX LoRA here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md). The Gemma-7b-it was loaded in without any conversion. The default `batch_size = 16` was chosen and to reach a 2400 iteration fine-tuned model the model was tuned with 800 iterations three times. Once the fine-tuned weights were created, the model was fused using MLX's fuse functionality. You can learn more about [fusing with MLX here](https://github.com/ml-explore/mlx-examples/blob/main/lora/README.md#Fuse-and-Upload). One important change made when fusing with MLX was to change some of the MLX package code to include `"format":"pt"` in the metadata so this model can be used with the transformers library. To do that, the following was done: you can tweak the library code like below in <path_to_your_site-packages>/mlx_lm/utils.py by replacing `mx.save_safetensors(str(shard_path), shard, metadata={"format":"mlx"})` with `mx.save_safetensors(str(shard_path), shard, metadata={"format":"pt"})` to output fused weights with the metadata attribute. Special thanks to [Alexweberk's guide on GitHub](https://gist.github.com/alexweberk/635431b5c5773efd6d1755801020429f) to help solve this issue. Finally, the fused model was uploaded to this HuggingFace repo!
 If you look at the GitHub repo for this project, mlx_lora.sh includes the command used for the LoRA fine-tuning, mlx_fuse.sh includes the command for the model fusing, and mlx_upload.sh includes the upload command. There is additionally an optional mlx_convert.sh for converting the Google Gemma 7b-it model before fine-tuning if desired.