Voicelab
/

trurl-2-7b-8bit

@@ -70,14 +70,19 @@ Trurl 2, same as a Llama 2, is a new technology that carries risks with use. Tes
 Please see the Meta's Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide)
 # Example use
 ## LLM
 Simply pass a prompt to a model and decode an output. Model will continue writing text based on sample you provided.
 ```
 import torch
-from transformers import LlamaForCausalLM, LlamaTokenizer
-tokenizer = LlamaTokenizer.from_pretrained("Voicelab/trurl-2-7b")
-model = LlamaForCausalLM.from_pretrained("Voicelab/trurl-2-7b")
 prompt = "Yesterday, when I was"
@@ -86,11 +91,13 @@ tokenized_prompt = tokenizer(prompt, return_tensors="pt")
 model.eval()
 with torch.no_grad():
     print(tokenizer.decode(
-        model.generate(**tokenized_prompt, max_new_tokens=200)[0],
         skip_special_tokens=True))
 ```
 Generated output:
-> Yesterday, when I was in the city, I saw a man who was walking his dog. and the dog was wearing a little sweater. I thought it was so cute! I wish I had a dog so I could get one of those sweaters for my own dog.
 ## Chat
 When using TRURL in a chat mode you should remember to use Llama 2 conversation template like in the example below.
@@ -98,10 +105,10 @@ When using TRURL in a chat mode you should remember to use Llama 2 conversation
 ```
 import torch
-from transformers import LlamaForCausalLM, LlamaTokenizer
-tokenizer = LlamaTokenizer.from_pretrained("Voicelab/trurl-2-7b")
-model = LlamaForCausalLM.from_pretrained("Voicelab/trurl-2-7b")
 prompt = """
 <s>[INST] <<SYS>>  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
@@ -140,7 +147,7 @@ tokenized_prompt = tokenizer(prompt, return_tensors="pt")
 model.eval()
 with torch.no_grad():
     print(tokenizer.decode(
-        model.generate(**tokenized_prompt, max_new_tokens=200)[0],
         skip_special_tokens=True))
 ```

 Please see the Meta's Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide)
 # Example use
+## Installation
+To use Quantized models you have to have the newest transformers (`pip install transformers --upgrade`), tokenizers (`pip install tokenizers --upgrade`), accelerate and bitsandbytes.
+If your output looks like random letters it means that you probably have wrong library version.
 ## LLM
 Simply pass a prompt to a model and decode an output. Model will continue writing text based on sample you provided.
 ```
 import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Voicelab/trurl-2-7b-8bit")
+model = AutoModelForCausalLM.from_pretrained("Voicelab/trurl-2-7b-8bit", device_map="auto")
 prompt = "Yesterday, when I was"
 model.eval()
 with torch.no_grad():
     print(tokenizer.decode(
+        model.generate(tokenized_prompt.data["input_ids"], max_new_tokens=200, temperature=0)[0],
         skip_special_tokens=True))
 ```
 Generated output:
+> Yesterday, when I was in the city, I saw a man who was walking with a cane. and he was walking with a very slow pace. I felt so sad for him. I wanted to help him, but I didn't know how. I wished I could do something to make him feel better.
+> Today, I saw the same man again. He was walking with the same slow pace, but this time he was walking with a woman who was supporting him. I felt so happy for him. I realized that he was not alone anymore and that he had someone to support him. I wished I could do the same for him.
+> I realized that sometimes, all we need is someone to support us. We don't need to be alone. We don't need to be sad. We just need someone to be there for us. And I am grateful that I could be there for him today.
 ## Chat
 When using TRURL in a chat mode you should remember to use Llama 2 conversation template like in the example below.
 ```
 import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Voicelab/trurl-2-7b-8bit")
+model = AutoModelForCausalLM.from_pretrained("Voicelab/trurl-2-7b-8bit", device_map="auto")
 prompt = """
 <s>[INST] <<SYS>>  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
 model.eval()
 with torch.no_grad():
     print(tokenizer.decode(
+        model.generate(tokenized_prompt.data["input_ids"], max_new_tokens=200, temperature=0)[0],
         skip_special_tokens=True))
 ```