File size: 7,313 Bytes
5b796ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
776ece8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69a55c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fdb017a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
tags:
- merge
- mergekit
- lazymergekit
- OpenPipe/mistral-ft-optimized-1218
- HuggingFaceH4/zephyr-7b-beta
base_model:
- OpenPipe/mistral-ft-optimized-1218
- HuggingFaceH4/zephyr-7b-beta
---

# AeolusBlend-7B-slerp

AeolusBlend-7B-slerp is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218)
* [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)

## 🧩 Configuration

```yaml
slices:
  - sources:
      - model: OpenPipe/mistral-ft-optimized-1218
        layer_range: [0, 32]
      - model: HuggingFaceH4/zephyr-7b-beta 
        layer_range: [0, 32]
merge_method: slerp
base_model: OpenPipe/mistral-ft-optimized-1218
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "lxyuan/AeolusBlend-7B-slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

>>>
[RESP] A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data to understand and generate human-like language. These models are typically made up of neural networks, which are a type of machine learning algorithm that can learn to recognize patterns in data and make predictions based on those patterns.
The term "large" in this context refers to the size of the model, which is measured by the number of parameters or connections in the neural network. Large language models can have billions or even trillions of parameters, making them capable of processing and generating extremely complex text.
Some examples of large language models include Google's BERT, OpenAI's GPT-3, and Facebook's Transformer-XL. These models have been trained on huge datasets such as books, articles, and web pages, allowing them to understand the nuances of language and generate text that is not only grammatically correct but also natural and fluent.
Large language models have a wide range of potential applications, including natural language processing (NLP) tasks such as text generation, translation, and summarization, as well as chatbots and virtual assistants that can
```


```python
messages = [{"role": "user", "content": "Who is Lee Kuan Yew? Summarize your answer in point form format"}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

>>>

[REP] Lee Kuan Yew was a Singaporean statesman who served as the first Prime Minister of Singapore from 1959 to 1990. Here are some key points about him:
- Born in 1923 in Singapore, Lee was educated in England and was a lawyer by profession.
- He was a founding member of the People's Action Party (PAP), which he led for many years.
- Under Lee's leadership, Singapore achieved independence from Malaysia in 1965 and became a sovereign nation.
- Lee was known for his pragmatic and authoritarian style of governance, emphasizing economic growth, law and order, and meritocracy.
- He played a significant role in Singapore's rapid development, transforming the country from a poor and undeveloped nation into a modern and prosperous city-state.
- Lee passed away in 2015, at the age of 91.
- He was widely regarded as one of the most influential leaders of the 20th century and a key figure in the history of Singapore.
```

### 4-bit Inferencing Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import transformers
import torch

#!nvidia-smi

"""
Wed Feb  7 12:51:07 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB           On  | 00000000:00:1E.0 Off |                    0 |
| N/A   41C    P0              44W / 300W |   4950MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
"""

model_id = "lxyuan/AeolusBlend-7B-slerp"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
)

messages = [{"role": "user", "content": "What is a large language model?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = pipeline(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)

print(outputs[0]["generated_text"])

>>>
<s>[INST] What is a large language model? [/INST]

A large language model is a type of artificial intelligence system that has been trained on vast amounts of
text data, enabling it to generate human-like responses to a wide range of written prompts. These models are
designed to learn the patterns and rules of language, and as a result, they can perform various natural
language processing tasks, such as translation, summarization, and question-answering, with a high degree
of accuracy. Large language models are typically powered by deep learning algorithms and can have billions
or trillions of parameters, making them capable of processing and understanding complex language structures
and nuances. Some well-known examples of large language models include GPT-3, BERT, and T5.
```

- 4bit Inference Example notebook can be found [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Inference_4bit_AeolusBlend.ipynb)
- Text-to-Graph with AeolusBlend: [here](https://github.com/LxYuan0420/nlp/blob/117f09cf7f09e3284d6a1eed475652ef90bb8545/notebooks/Inference_AeolusBlend_KnowledgeGraph.ipynb)