filipealmeida
commited on
Commit
•
13f22f0
1
Parent(s):
a7cbcd5
Upload folder using huggingface_hub
Browse files- .gitattributes +3 -0
- README.md +149 -1
- ggml-model-Q4_0.gguf +3 -0
- ggml-model-Q8_0.gguf +3 -0
- ggml-model-f16.gguf +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ggml-model-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
|
37 |
+
ggml-model-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
38 |
+
ggml-model-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,151 @@
|
|
1 |
---
|
2 |
-
license: cc-by-3.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: cc-by-sa-3.0
|
3 |
+
datasets:
|
4 |
+
- VMware/open-instruct
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
library_name: transformers
|
8 |
+
pipeline_tag: text-generation
|
9 |
---
|
10 |
+
|
11 |
+
# Open LLama 7B v2 Open Instruct
|
12 |
+
- Model creator: [VMware](https://huggingface.co/VMware)
|
13 |
+
- Original model: [](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
|
14 |
+
|
15 |
+
## Description
|
16 |
+
|
17 |
+
This repo contains the GGUF model files for [Open LLama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
|
18 |
+
|
19 |
+
These files are compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
20 |
+
|
21 |
+
# VMware/open-llama-7B-v2-open-instruct
|
22 |
+
Instruction-tuned version of the fully trained Open LLama 7B v2 model. The model is open for <b>COMMERCIAL USE</b>. <br>
|
23 |
+
|
24 |
+
- This model performs better on code compared to v1 due to the improvements made on the base model by the openlm-research team.
|
25 |
+
- The instruction model is trained on an improved instruction tuning dataset compared to v1
|
26 |
+
|
27 |
+
**NOTE**: The model was trained using the Alpaca prompt template <br>
|
28 |
+
**NOTE**: Fast tokenizer results in incorrect encoding, set the ```use_fast = False``` parameter, when instantiating the tokenizer
|
29 |
+
|
30 |
+
|
31 |
+
## License
|
32 |
+
- CC BY-SA-3.0 **(Commercially Viable!)**
|
33 |
+
- Base Language Model ([openlm-research/open_llama_v2_7b](https://huggingface.co/openlm-research/open_llama_v2_7b)) is under apache-2.0
|
34 |
+
- Fine-Tuning Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
|
35 |
+
|
36 |
+
## Datasets used for Fine-Tuning
|
37 |
+
|
38 |
+
### Open-instruct
|
39 |
+
|
40 |
+
**Open-instruct-v1**
|
41 |
+
- Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
|
42 |
+
|
43 |
+
**Subset of COT SUBMIX (FROM FLAN V2) Zeroshot examples**
|
44 |
+
- ESNLI - MIT
|
45 |
+
- ECQA - CDLA 1.0 - Sharing
|
46 |
+
- Strategy - MIT
|
47 |
+
- CREAK - MIT
|
48 |
+
- gsmk8 - MIT
|
49 |
+
- aqua - MIT
|
50 |
+
- qasc - Apache 2.0
|
51 |
+
|
52 |
+
|
53 |
+
## Nomenclature
|
54 |
+
|
55 |
+
- Model : Open-llama-v2
|
56 |
+
- Model Size: 7B parameters
|
57 |
+
- Dataset: Open-instruct
|
58 |
+
|
59 |
+
|
60 |
+
## Use in Transformers
|
61 |
+
|
62 |
+
```
|
63 |
+
import os
|
64 |
+
import torch
|
65 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
66 |
+
|
67 |
+
model_name = 'VMware/open-llama-7b-v2-open-instruct'
|
68 |
+
|
69 |
+
|
70 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
|
71 |
+
|
72 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map='sequential')
|
73 |
+
|
74 |
+
prompt_template = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
|
75 |
+
|
76 |
+
prompt = """What is attention mechanism of a transformer model?
|
77 |
+
Write a python code to illustrate how attention works within a transformer model using numpy library. Donot use pytorch or tensorflow."""
|
78 |
+
|
79 |
+
|
80 |
+
inputt = prompt_template.format(instruction= prompt)
|
81 |
+
input_ids = tokenizer(inputt, return_tensors="pt").input_ids.to("cuda")
|
82 |
+
|
83 |
+
output1 = model.generate(input_ids, max_length=512)
|
84 |
+
input_length = input_ids.shape[1]
|
85 |
+
output1 = output1[:, input_length:]
|
86 |
+
output = tokenizer.decode(output1[0])
|
87 |
+
|
88 |
+
print(output)
|
89 |
+
|
90 |
+
```
|
91 |
+
|
92 |
+
|
93 |
+
### Output
|
94 |
+
|
95 |
+
Sure, I can help you with that!
|
96 |
+
|
97 |
+
Attention mechanisms in transformer models are typically implemented using the attention mechanism in the self-attention layer. Self-attention allows the model to focus on different parts of the input sequence when processing it. This is achieved by computing a set of attention weights, which are used to weigh the contribution of each input element to the output.
|
98 |
+
|
99 |
+
Here's an example code using NumPy to illustrate how attention works in a transformer model:
|
100 |
+
|
101 |
+
```python
|
102 |
+
import numpy as np
|
103 |
+
|
104 |
+
def attention_weights(query, key, value, mask):
|
105 |
+
# Query, key, and value are input tensors. Mask is a tensor of zeros and ones that represents the attention mask.
|
106 |
+
# It is used to prevent the model from attending to certain positions in the input sequence if they are not relevant.
|
107 |
+
# The attention weights are the element-wise product of the query, key, and mask tensors.
|
108 |
+
# The result is a tensor of the same shape as the query tensor.
|
109 |
+
|
110 |
+
# Compute the dot product between the query tensor and the key tensor
|
111 |
+
dot = np.matmul(query, key)
|
112 |
+
|
113 |
+
# Compute the element-wise softmax of the dot product tensor
|
114 |
+
exp_dot = np.exp(dot)
|
115 |
+
|
116 |
+
# Multiply the dot product and the softmax of the dot product tensors
|
117 |
+
weights = dot * exp_dot
|
118 |
+
|
119 |
+
# Return the attention weights as a NumPy tensor
|
120 |
+
return weights
|
121 |
+
|
122 |
+
# Define the input sequence
|
123 |
+
query = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
|
124 |
+
key = np.array([[0.1, 0.2], [0.3, 0.4]])
|
125 |
+
value = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
|
126 |
+
mask = np.array([[False, True, True], [False, True, True]])
|
127 |
+
|
128 |
+
# Compute the attention weights
|
129 |
+
weights = attention_weights(query, key, value, mask)
|
130 |
+
|
131 |
+
# Print the attention weights
|
132 |
+
print(weights)
|
133 |
+
```
|
134 |
+
|
135 |
+
In this example, the `attention_weights` function takes as input the query tensor, key tensor, value tensor, and mask tensor. It computes the dot product between the query and key tensors using the `np.matmul` function, and then applies a softmax function using the `np.exp` function to the element-wise dot product tensor. It then multiplies the dot product and softmax tensors using the `np.matmul` function, and returns the result as a NumPy tensor.
|
136 |
+
|
137 |
+
The `query`, `key`, and `value` tensors represent the input sequence to the transformer model. The `mask` tensor represents the attention mask, which is used to prevent the model from attending to certain positions in the input sequence if they are not relevant.
|
138 |
+
|
139 |
+
The output of the `attention_weights` function is a NumPy tensor that represents the attention weights for the input sequence. These weights are used by the transformer model to weigh the contribution of each input element to the output.
|
140 |
+
|
141 |
+
I hope this helps!</s>
|
142 |
+
<hr>
|
143 |
+
|
144 |
+
|
145 |
+
## Finetuning details
|
146 |
+
The finetuning scripts will be available in our [RAIL Github Repository](https://github.com/vmware-labs/research-and-development-artificial-intelligence-lab/tree/main/instruction-tuning)
|
147 |
+
|
148 |
+
|
149 |
+
## Evaluation
|
150 |
+
|
151 |
+
**TODO**
|
ggml-model-Q4_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:15709803fa2158f52687213188651e4d64a290101a4707339a0550dff4e64212
|
3 |
+
size 3825818912
|
ggml-model-Q8_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:243c305cdd04816039a5f20be4417b257c37fa4ef9f8ab6505b79305d463de22
|
3 |
+
size 7161101600
|
ggml-model-f16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c35cfd775e365a800da010e01b379b86840c727d370114e2c87fa98da8c09e08
|
3 |
+
size 13478116608
|