Kukedlc commited on
Commit
f514980
1 Parent(s): 41710bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -15
README.md CHANGED
@@ -10,10 +10,14 @@ base_model:
10
  - mlabonne/ChimeraLlama-3-8B-v2
11
  - nbeerbower/llama-3-stella-8B
12
  - uygarkurt/llama-3-merged-linear
 
13
  ---
14
 
15
  # NeuralLLaMa-3-8b-DT-v0.1
16
 
 
 
 
17
  NeuralLLaMa-3-8b-DT-v0.1 is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
18
  * [mlabonne/ChimeraLlama-3-8B-v2](https://huggingface.co/mlabonne/ChimeraLlama-3-8B-v2)
19
  * [nbeerbower/llama-3-stella-8B](https://huggingface.co/nbeerbower/llama-3-stella-8B)
@@ -43,28 +47,43 @@ parameters:
43
  int8_mask: true
44
  dtype: float16
45
  ```
 
 
 
46
 
47
  ## 💻 Usage
48
 
49
  ```python
50
- !pip install -qU transformers accelerate
51
 
52
- from transformers import AutoTokenizer
53
- import transformers
54
  import torch
55
 
56
- model = "Kukedlc/NeuralLLaMa-3-8b-DT-v0.1"
57
- messages = [{"role": "user", "content": "What is a large language model?"}]
58
-
59
- tokenizer = AutoTokenizer.from_pretrained(model)
60
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
61
- pipeline = transformers.pipeline(
62
- "text-generation",
63
- model=model,
64
- torch_dtype=torch.float16,
65
- device_map="auto",
66
  )
67
 
68
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
69
- print(outputs[0]["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ```
 
10
  - mlabonne/ChimeraLlama-3-8B-v2
11
  - nbeerbower/llama-3-stella-8B
12
  - uygarkurt/llama-3-merged-linear
13
+ license: other
14
  ---
15
 
16
  # NeuralLLaMa-3-8b-DT-v0.1
17
 
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d71ab4089bc502ceb44d29/tK72e9RGnYyBVRy0T_Kba.png)
20
+
21
  NeuralLLaMa-3-8b-DT-v0.1 is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
22
  * [mlabonne/ChimeraLlama-3-8B-v2](https://huggingface.co/mlabonne/ChimeraLlama-3-8B-v2)
23
  * [nbeerbower/llama-3-stella-8B](https://huggingface.co/nbeerbower/llama-3-stella-8B)
 
47
  int8_mask: true
48
  dtype: float16
49
  ```
50
+ ## 🗨️ Chats
51
+
52
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d71ab4089bc502ceb44d29/feYEkbM_TqeahAMOoiGoG.png)
53
 
54
  ## 💻 Usage
55
 
56
  ```python
57
+ !pip install -qU transformers accelerate bitsandbytes
58
 
59
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig
 
60
  import torch
61
 
62
+ bnb_config = BitsAndBytesConfig(
63
+ load_in_4bit=True,
64
+ bnb_4bit_use_double_quant=True,
65
+ bnb_4bit_quant_type="nf4",
66
+ bnb_4bit_compute_dtype=torch.bfloat16
 
 
 
 
 
67
  )
68
 
69
+ MODEL_NAME = 'Kukedlc/NeuralLLaMa-3-8b-DT-v0.1'
70
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
71
+ model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='cuda:0', quantization_config=bnb_config)
72
+
73
+ prompt_system = "You are an advanced language model that speaks Spanish fluently, clearly, and precisely.\
74
+ You are called Roberto the Robot and you are an aspiring post-modern artist."
75
+ prompt = "Create a piece of art that represents how you see yourself, Roberto, as an advanced LLm, with ASCII art, mixing diagrams, engineering and let yourself go."
76
+
77
+ chat = [
78
+ {"role": "system", "content": f"{prompt_system}"},
79
+ {"role": "user", "content": f"{prompt}"},
80
+ ]
81
+
82
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
83
+ inputs = tokenizer(chat, return_tensors="pt").to('cuda')
84
+ streamer = TextStreamer(tokenizer)
85
+ stop_token = "<|eot_id|>"
86
+ stop = tokenizer.encode(stop_token)[0]
87
+
88
+ _ = model.generate(**inputs, streamer=streamer, max_new_tokens=1024, do_sample=True, temperature=0.7, repetition_penalty=1.2, top_p=0.9, eos_token_id=stop)
89
  ```