abrahammg
/

Llama3-8B-Galician-Chat-Lora

Model card Files Files and versions Community

abrahammg commited on Apr 30

Commit

547548c

•

1 Parent(s): 2316149

Update README.md

Files changed (1) hide show

README.md +35 -9

README.md CHANGED Viewed

@@ -31,20 +31,46 @@ To use this model, follow the example code provided below. Ensure you have the n
 ```bash
 pip install transformers
 ```
-### Installation
 ```bash
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "abrahammg/Llama3-8B-Galician-Chat"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(model_name)
-text = "Enter some text in Galician here."
-inputs = tokenizer(text, return_tensors="pt")
-outputs = model.generate(**inputs)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```

 ```bash
 pip install transformers
+pip install bitsandbytes
+pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
+pip install llmtuner
 ```
+### Test the model
 ```bash
+from llmtuner import ChatModel
+from llmtuner.extras.misc import torch_gc
+chat_model = ChatModel(dict(
+  model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
+  adapter_name_or_path="model",            # load the saved LoRA adapters
+  finetuning_type="lora",                  # same to the one in training
+  template="llama3",                     # same to the one in training
+  quantization_bit=4,                    # load 4-bit quantized model
+  use_unsloth=True,                     # use UnslothAI's LoRA optimization for 2x faster generation
+))
+messages = []
+while True:
+  query = input("\nUser: ")
+  if query.strip() == "exit":
+    break
+  if query.strip() == "clear":
+    messages = []
+    torch_gc()
+    print("History has been removed.")
+    continue
+  messages.append({"role": "user", "content": query})     # add query to messages
+  print("Assistant: ", end="", flush=True)
+  response = ""
+  for new_text in chat_model.stream_chat(messages):      # stream generation
+    print(new_text, end="", flush=True)
+    response += new_text
+  print()
+  messages.append({"role": "assistant", "content": response}) # add response to messages
+torch_gc()
 ```