lxyuan
/

AeolusBlend-7B-slerp

@@ -86,4 +86,68 @@ print(outputs[0]["generated_text"])
 - He played a significant role in Singapore's rapid development, transforming the country from a poor and undeveloped nation into a modern and prosperous city-state.
 - Lee passed away in 2015, at the age of 91.
 - He was widely regarded as one of the most influential leaders of the 20th century and a key figure in the history of Singapore.
-```

 - He played a significant role in Singapore's rapid development, transforming the country from a poor and undeveloped nation into a modern and prosperous city-state.
 - Lee passed away in 2015, at the age of 91.
 - He was widely regarded as one of the most influential leaders of the 20th century and a key figure in the history of Singapore.
+```
+### 4-bit Inferencing Example
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+import transformers
+import torch
+#!nvidia-smi
+"""
+Wed Feb  7 12:51:07 2024
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  Tesla V100-SXM2-16GB           On  | 00000000:00:1E.0 Off |                    0 |
+| N/A   41C    P0              44W / 300W |   4950MiB / 16384MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+"""
+model_id = "lxyuan/AeolusBlend-7B-slerp"
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    device_map="auto",
+)
+messages = [{"role": "user", "content": "What is a large language model?"}]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+outputs = pipeline(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+print(outputs[0]["generated_text"])
+>>>
+<s>[INST] What is a large language model? [/INST]
+A large language model is a type of artificial intelligence system that has been trained on vast amounts of
+text data, enabling it to generate human-like responses to a wide range of written prompts. These models are
+designed to learn the patterns and rules of language, and as a result, they can perform various natural
+language processing tasks, such as translation, summarization, and question-answering, with a high degree
+of accuracy. Large language models are typically powered by deep learning algorithms and can have billions
+or trillions of parameters, making them capable of processing and understanding complex language structures
+and nuances. Some well-known examples of large language models include GPT-3, BERT, and T5.
+```
+- 4bit Inference Example notebook can be found [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Inference_4bit_AeolusBlend.ipynb)