Alignment-Lab-AI commited on
Commit
ce7b25b
1 Parent(s): ca6af82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -26
README.md CHANGED
@@ -3,9 +3,15 @@ base_model: Alignment-Lab-AI/Neural-network-medium-untuned-theta
3
  tags:
4
  - axolotl
5
  - Alignment-Lab-AI
 
6
  model-index:
7
- - name: Buzz-5B-Medium
8
  results: []
 
 
 
 
 
9
  ---
10
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
11
 
@@ -13,25 +19,26 @@ model-index:
13
 
14
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/fWaQucBWfabfnMsAFN8hv.png)
15
 
16
- # Buzz-5b-Medium: Advancing Efficiency through Iterative Fine-Tuning
17
 
18
  ## Introduction
19
 
20
  - [Alignment Lab AI](https://AlignmentLab.ai) is pleased to introduce our latest research efforts with:
21
 
22
- **Buzz-5b-Medium**, a state-of-the-art language model developed in collaboration with [Hive Digital Technologies](https://hivedt.com/).
23
 
24
  The Buzz model, Dataset, and Code are to be released to build a toolkit that aims to demonstrate the potential for reuse and optimization of existing pretrained language models to continuously refine the heights of performance that can be achieved with optimal use of FlOps. Alongside Buzz-5b-Medium, we release
25
 
26
- - [The Buzz Dataset](https://huggingface.co/datasets/tempbuzz/Buzz)
27
- - [Buzz-2.5b-Small](https://huggingface.co/tempbuzz/buzz-Buzz-2.5b-Small)
 
28
  - [Buzz-8B-Large](https://huggingface.co/tempbuzz/Lab-AI/Buzz-8B-Large)
29
 
30
- the **Buzz dataset** and two additional models: **Buzz-2.5B-Small** (2.5B parameters) and **Buzz-8B-Large** (8B parameters), the codebase to refine, filter and augment the data, as well as prune and train your own variants, will additionally be released in the coming days.
31
 
32
  ## Performance
33
 
34
- Buzz-5b-Medium achieves remarkably low train and validation loss, with unseen data loss reaching around **0.5** by the end of training. This performance showcases the effectiveness of our novel iterative fine-tuning approach, which maximizes the reuse of pretrained weights. Even the smallest variant, Buzz-Small, maintains a steady train loss of approximately **0.4-0.6**, on entirely new data and hold out sets.
35
 
36
  [ benchmark scores table here]
37
 
@@ -51,37 +58,42 @@ By combining high quality data, iterative fine-tuning with carefully selected "g
51
 
52
 
53
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/wyHyDIJnNmbomonZKQAD0.png)
54
- https://wandb.ai/llm_surgery/llama-3-8b-vs-5b
55
- https://wandb.ai/autometa/neural-network-1
56
- https://wandb.ai/autometa/buzz-baby?nw=nwuserautometa
57
- https://wandb.ai/autometa/buzz-brother?nw=nwuserautometa
58
- https://wandb.ai/autometa/buzz-big?nw=nwuserautometa
59
 
60
  ## Chat Template and Inference
61
 
62
- To use the Buzz-5b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to format the chat template and perform inference using the Hugging Face Transformers library:
63
  ```python
64
  from transformers import AutoTokenizer, AutoModelForCausalLM
65
 
66
- model_name = "tempbuzz/Buzz-5b-Medium"
 
67
  tokenizer = AutoTokenizer.from_pretrained(model_name)
68
  model = AutoModelForCausalLM.from_pretrained(model_name)
69
 
70
- chat_template = """{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"""
 
 
71
 
72
- messages = [
73
- {"role": "user", "content": "Hello, how are you?"},
74
- {"role": "assistant", "content": "I'm doing well, thank you for asking! How can I assist you today?"},
75
- {"role": "user", "content": "Can you tell me a joke?"}
76
- ]
77
 
78
- input_text = chat_template.format(messages=messages, add_generation_prompt=True)
79
- input_ids = tokenizer.encode(input_text, return_tensors="pt")
80
 
81
- output = model.generate(input_ids, max_length=100, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
82
- generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
 
 
 
 
 
 
83
 
84
- print(generated_text)
 
 
 
 
85
  ``````
86
  ## Conclusion
87
 
@@ -152,4 +164,4 @@ as well as many, many others who are too numerous to name.
152
  archivePrefix={arXiv},
153
  primaryClass={cs.CL}
154
  }
155
- ```
 
3
  tags:
4
  - axolotl
5
  - Alignment-Lab-AI
6
+ - Meta-Llama-3
7
  model-index:
8
+ - name: Buzz-8b-Large-0.5
9
  results: []
10
+ license: apache-2.0
11
+ datasets:
12
+ - H-D-T/Buzz
13
+ language:
14
+ - en
15
  ---
16
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
 
 
19
 
20
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/fWaQucBWfabfnMsAFN8hv.png)
21
 
22
+ # Buzz-8b-Large: Advancing Efficiency through Iterative Fine-Tuning
23
 
24
  ## Introduction
25
 
26
  - [Alignment Lab AI](https://AlignmentLab.ai) is pleased to introduce our latest research efforts with:
27
 
28
+ **Buzz-8b-Large**, a state-of-the-art language model developed in collaboration with [Hive Digital Technologies](https://hivedt.com/).
29
 
30
  The Buzz model, Dataset, and Code are to be released to build a toolkit that aims to demonstrate the potential for reuse and optimization of existing pretrained language models to continuously refine the heights of performance that can be achieved with optimal use of FlOps. Alongside Buzz-5b-Medium, we release
31
 
32
+ - [The Buzz Dataset](https://huggingface.co/datasets/H-D-T/Buzz)
33
+ - [Buzz-2.5b-Small] soon!
34
+ - [Buzz-5b-Medium] soon!
35
  - [Buzz-8B-Large](https://huggingface.co/tempbuzz/Lab-AI/Buzz-8B-Large)
36
 
37
+ the **Buzz dataset** and two additional models: **Buzz-2.5B-Small** and **Buzz-5B-Medium**, the codebase to refine, filter and augment the data, as well as prune and train your own variants, will additionally be released in the coming days.
38
 
39
  ## Performance
40
 
41
+ Buzz-8b-Large achieves remarkably low train and validation loss, with unseen data loss reaching around **0.5** by the end of training. This performance showcases the effectiveness of our novel iterative fine-tuning approach, which maximizes the reuse of pretrained weights. Even the smallest variant, Buzz-Small, maintains a steady train loss of approximately **0.4-0.6**, on entirely new data and hold out sets.
42
 
43
  [ benchmark scores table here]
44
 
 
58
 
59
 
60
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6436279eaaef013d1af225c9/wyHyDIJnNmbomonZKQAD0.png)
 
 
 
 
 
61
 
62
  ## Chat Template and Inference
63
 
64
+ To use the Buzz-8b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to format the chat template and perform inference using the Hugging Face Transformers library:
65
  ```python
66
  from transformers import AutoTokenizer, AutoModelForCausalLM
67
 
68
+ # Load the tokenizer and model
69
+ model_name = "H-D-T/Buzz-8b-Large-v0.5"
70
  tokenizer = AutoTokenizer.from_pretrained(model_name)
71
  model = AutoModelForCausalLM.from_pretrained(model_name)
72
 
73
+ # Set the device to run the model on (e.g., "cuda" for GPU, "cpu" for CPU)
74
+ device = "cuda" if torch.cuda.is_available() else "cpu"
75
+ model.to(device)
76
 
77
+ # Define the input prompt
78
+ prompt = "Hello, how are you today?"
 
 
 
79
 
80
+ # Tokenize the input prompt
81
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
82
 
83
+ # Generate the model's response
84
+ output = model.generate(
85
+ input_ids,
86
+ max_length=100,
87
+ num_return_sequences=1,
88
+ no_repeat_ngram_size=2,
89
+ early_stopping=True
90
+ )
91
 
92
+ # Decode the generated response
93
+ response = tokenizer.decode(output[0], skip_special_tokens=True)
94
+
95
+ print("Input:", prompt)
96
+ print("Response:", response)
97
  ``````
98
  ## Conclusion
99
 
 
164
  archivePrefix={arXiv},
165
  primaryClass={cs.CL}
166
  }
167
+ ```