ritikk commited on
Commit
04dc1b0
1 Parent(s): 23930e3

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .ipynb_checkpoints/README-checkpoint.md +156 -0
  2. README.md +156 -0
  3. config.json +27 -0
  4. generation_config.json +6 -0
  5. pytorch_model.bin/key_to_filename.json +3 -0
  6. pytorch_model.bin/p0.model.embed_tokens.weight +3 -0
  7. pytorch_model.bin/p1.model.layers.0.self_attn.q_proj.weight +3 -0
  8. pytorch_model.bin/p10.model.layers.1.self_attn.q_proj.weight +3 -0
  9. pytorch_model.bin/p100.model.layers.11.self_attn.q_proj.weight +3 -0
  10. pytorch_model.bin/p101.model.layers.11.self_attn.k_proj.weight +3 -0
  11. pytorch_model.bin/p102.model.layers.11.self_attn.v_proj.weight +3 -0
  12. pytorch_model.bin/p103.model.layers.11.self_attn.o_proj.weight +3 -0
  13. pytorch_model.bin/p104.model.layers.11.mlp.gate_proj.weight +3 -0
  14. pytorch_model.bin/p105.model.layers.11.mlp.up_proj.weight +3 -0
  15. pytorch_model.bin/p106.model.layers.11.mlp.down_proj.weight +3 -0
  16. pytorch_model.bin/p107.model.layers.11.input_layernorm.weight +3 -0
  17. pytorch_model.bin/p108.model.layers.11.post_attention_layernorm.weight +3 -0
  18. pytorch_model.bin/p109.model.layers.12.self_attn.q_proj.weight +3 -0
  19. pytorch_model.bin/p11.model.layers.1.self_attn.k_proj.weight +3 -0
  20. pytorch_model.bin/p110.model.layers.12.self_attn.k_proj.weight +3 -0
  21. pytorch_model.bin/p111.model.layers.12.self_attn.v_proj.weight +3 -0
  22. pytorch_model.bin/p112.model.layers.12.self_attn.o_proj.weight +3 -0
  23. pytorch_model.bin/p113.model.layers.12.mlp.gate_proj.weight +3 -0
  24. pytorch_model.bin/p114.model.layers.12.mlp.up_proj.weight +3 -0
  25. pytorch_model.bin/p115.model.layers.12.mlp.down_proj.weight +3 -0
  26. pytorch_model.bin/p116.model.layers.12.input_layernorm.weight +3 -0
  27. pytorch_model.bin/p117.model.layers.12.post_attention_layernorm.weight +3 -0
  28. pytorch_model.bin/p118.model.layers.13.self_attn.q_proj.weight +3 -0
  29. pytorch_model.bin/p119.model.layers.13.self_attn.k_proj.weight +3 -0
  30. pytorch_model.bin/p12.model.layers.1.self_attn.v_proj.weight +3 -0
  31. pytorch_model.bin/p120.model.layers.13.self_attn.v_proj.weight +3 -0
  32. pytorch_model.bin/p121.model.layers.13.self_attn.o_proj.weight +3 -0
  33. pytorch_model.bin/p122.model.layers.13.mlp.gate_proj.weight +3 -0
  34. pytorch_model.bin/p123.model.layers.13.mlp.up_proj.weight +3 -0
  35. pytorch_model.bin/p124.model.layers.13.mlp.down_proj.weight +3 -0
  36. pytorch_model.bin/p125.model.layers.13.input_layernorm.weight +3 -0
  37. pytorch_model.bin/p126.model.layers.13.post_attention_layernorm.weight +3 -0
  38. pytorch_model.bin/p127.model.layers.14.self_attn.q_proj.weight +3 -0
  39. pytorch_model.bin/p128.model.layers.14.self_attn.k_proj.weight +3 -0
  40. pytorch_model.bin/p129.model.layers.14.self_attn.v_proj.weight +3 -0
  41. pytorch_model.bin/p13.model.layers.1.self_attn.o_proj.weight +3 -0
  42. pytorch_model.bin/p130.model.layers.14.self_attn.o_proj.weight +3 -0
  43. pytorch_model.bin/p131.model.layers.14.mlp.gate_proj.weight +3 -0
  44. pytorch_model.bin/p132.model.layers.14.mlp.up_proj.weight +3 -0
  45. pytorch_model.bin/p133.model.layers.14.mlp.down_proj.weight +3 -0
  46. pytorch_model.bin/p134.model.layers.14.input_layernorm.weight +3 -0
  47. pytorch_model.bin/p135.model.layers.14.post_attention_layernorm.weight +3 -0
  48. pytorch_model.bin/p136.model.layers.15.self_attn.q_proj.weight +3 -0
  49. pytorch_model.bin/p137.model.layers.15.self_attn.k_proj.weight +3 -0
  50. pytorch_model.bin/p138.model.layers.15.self_attn.v_proj.weight +3 -0
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - mistral
9
+ - pytorch
10
+ - inferentia2
11
+ - neuron
12
+ ---
13
+ # Neuronx model for Mistral
14
+
15
+ This repository contains [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
16
+
17
+ However, this file includes an example of how to compile various versions of Mistral. Support isn’t available yet (as of 1/3/2024) in the optimum-neuron framework, so we use the base transformers library.
18
+
19
+ These instructions closely follow the [Developer Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#grouped-query-attention-gqa-support-beta). Look there for more detailed explanations, especially for the GQA settings.
20
+
21
+ This model has been compiled to run on an inf2.xlarge (the smallest Inferentia2 instance). You can run it on a bigger instance, but it will only use two cores no matter how many are available, unless you change the core number available in compilation. Remember that each Neuron processor has two cores.
22
+
23
+
24
+ ## Set up the environment
25
+
26
+ First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled. However, you will need to update transformers-neruonx from the source to get Mistral support.
27
+
28
+
29
+ ```
30
+ python -m pip install git+https://github.com/aws-neuron/transformers-neuronx.git
31
+ ```
32
+
33
+ ## Running inference from this repository
34
+
35
+ If you want to run a quick test or if the exact model you want to use is [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), you can run it directly using the steps below. Otherwise, jump to the Compilation of other Mistral versions section.
36
+
37
+ First, you will need a local copy of the library. This is because one of the nice things that the Hugging Face optimum library does is abstract local loads from repository loads. However, Mistral inference isn't supported yet.
38
+
39
+ From python:
40
+
41
+ ```
42
+ # using python instead of git clone because I know this supports lfs on the DLAMI image
43
+ from huggingface_hub import Repository
44
+ repo = Repository(local_dir="Mistral-neuron", clone_from="aws-neuron/Mistral-neuron")
45
+
46
+ ```
47
+
48
+ This should put a local copy in Mistral-neuron. This process should take a 5-10 minutes. If it completes in a few seconds the first time you run it, you are having problems with git-lfs. You can see this by using ls -al to check the size of the files downloaded. You will also notice it later when you get parsing errors.
49
+
50
+ Next, load the model and neff files from disk into the Neuron processors:
51
+
52
+ ```
53
+ import torch
54
+ from transformers_neuronx import constants
55
+ from transformers_neuronx.mistral.model import MistralForSampling
56
+ from transformers_neuronx.module import save_pretrained_split
57
+ from transformers_neuronx.config import NeuronConfig
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer
59
+
60
+ # Set sharding strategy for GQA to be shard over heads
61
+ neuron_config = NeuronConfig(
62
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
63
+ )
64
+ # define the model. These are the settings used in compilation.
65
+ # If you want to change these settings, skip to "Compilation of other Mistral versions"
66
+ model_neuron = MistralForSampling.from_pretrained("Mistral-neuron", batch_size=1, tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
67
+
68
+ # load the neff files from the local directory instead of compiling
69
+ model_neuron.load("Mistral-neuron")
70
+
71
+ # load the neff files into the neuron processors.
72
+ # you can see this process happening if you run neuron-top from the command line in another console.
73
+ # if you didn't do the previous load command, this will also compile the neff files
74
+ model_neuron.to_neuron()
75
+
76
+
77
+ ```
78
+
79
+ ## Inference example
80
+
81
+ This points to the original model for the tokenizer because the tokenizer is the same.
82
+ If you are compiling your own and want to have a single reference for everything, you can copy the special_tokens_map.json and tokenizer* from the original model to your local copy.
83
+
84
+ ```
85
+ # Get a tokenizer and example input. Note that this points to the original model
86
+ tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1')
87
+ text = "[INST] What is your favourite condiment? [/INST]"
88
+ encoded_input = tokenizer(text, return_tensors='pt')
89
+
90
+ # Run inference
91
+ with torch.inference_mode():
92
+ generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None)
93
+ print([tokenizer.decode(tok) for tok in generated_sequence])
94
+
95
+ ```
96
+
97
+
98
+ Example output:
99
+ (most of the time with amp=‘bf16’, the answer is ketchup. However, if I compiled with amp=f32, the answer was soy sauce. This was for a sample size of one, so let me know what you see —@jburtoft)
100
+
101
+ ```
102
+ 2024-Jan-03 15:59:21.0510 1486:2057 [0] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
103
+ 2024-Jan-03 15:59:21.0510 1486:2057 [0] init.cc:138 CCOM WARN OFI plugin initNet() failed is EFA enabled?
104
+ ['<s> [INST] What is your favourite condiment? [/INST] My favorite condiment is probably ketchup. It adds a perfect balance of sweet, tangy, and slightly spicy flavor to dishes, and is versatile enough to go with a wide variety of foods.</s>']
105
+
106
+ ```
107
+
108
+ ## Compilation of other Mistral versions
109
+
110
+ If you want to use a different version of Mistral from Hugging Face, use the slightly modified code below. It essentially removes the “load” command. When the “to_neuron()” command sees that the model object doesn’t include the neff files, it will kick off the recompile. You can save them at the end so you only have to do the compilation process once. After that, you can use the code above to load a model and the neff files from the local directory.
111
+
112
+ ```
113
+ import torch
114
+ from transformers_neuronx import constants
115
+ from transformers_neuronx.mistral.model import MistralForSampling
116
+ from transformers_neuronx.module import save_pretrained_split
117
+ from transformers_neuronx.config import NeuronConfig
118
+ from transformers import AutoModelForCausalLM, AutoTokenizer
119
+
120
+ # Load and save the CPU model with bfloat16 casting. This also gives us a local copy
121
+ # change the Hugging Face model name (mistralai/Mistral-7B-Instruct-v0.1) below to what you want
122
+ # You can update the other model names if you want, but they just reference a directory on the local disk.
123
+ model_cpu = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1')
124
+ save_pretrained_split(model_cpu, 'mistralai/Mistral-7B-Instruct-v0.1-split')
125
+
126
+ # Set sharding strategy for GQA to be shard over heads
127
+ neuron_config = NeuronConfig(
128
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
129
+ )
130
+
131
+ # Create and compile the Neuron model
132
+ model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1-split', batch_size=1, \
133
+ tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
134
+ model_neuron.to_neuron()
135
+
136
+ #save compiled neff files out to the same directory
137
+ model_neuron.save("mistralai/Mistral-7B-Instruct-v0.1-split")
138
+
139
+
140
+ ```
141
+
142
+
143
+
144
+ ## Arguments passed during compilation
145
+
146
+ The settings use in compilation are the same as shown above in the code. If you want to change these, you will need to recompile. If you don’t want to pass them in each time, you could update the config.json file. This is another nice thing the Hugging Face optimum framework does for us. You can see an example of the format by looking at one of the Llama model config.json files. For [example](https://huggingface.co/aws-neuron/Llama-2-7b-hf-neuron-latency/blob/main/config.json).
147
+
148
+ ```
149
+ neuron_config = NeuronConfig(
150
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
151
+ )
152
+ ("Mistral-neuron", batch_size=1, tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
153
+
154
+ ```
155
+
156
+
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - mistral
9
+ - pytorch
10
+ - inferentia2
11
+ - neuron
12
+ ---
13
+ # Neuronx model for Mistral
14
+
15
+ This repository contains [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
16
+
17
+ However, this file includes an example of how to compile various versions of Mistral. Support isn’t available yet (as of 1/3/2024) in the optimum-neuron framework, so we use the base transformers library.
18
+
19
+ These instructions closely follow the [Developer Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#grouped-query-attention-gqa-support-beta). Look there for more detailed explanations, especially for the GQA settings.
20
+
21
+ This model has been compiled to run on an inf2.xlarge (the smallest Inferentia2 instance). You can run it on a bigger instance, but it will only use two cores no matter how many are available, unless you change the core number available in compilation. Remember that each Neuron processor has two cores.
22
+
23
+
24
+ ## Set up the environment
25
+
26
+ First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled. However, you will need to update transformers-neruonx from the source to get Mistral support.
27
+
28
+
29
+ ```
30
+ python -m pip install git+https://github.com/aws-neuron/transformers-neuronx.git
31
+ ```
32
+
33
+ ## Running inference from this repository
34
+
35
+ If you want to run a quick test or if the exact model you want to use is [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1), you can run it directly using the steps below. Otherwise, jump to the Compilation of other Mistral versions section.
36
+
37
+ First, you will need a local copy of the library. This is because one of the nice things that the Hugging Face optimum library does is abstract local loads from repository loads. However, Mistral inference isn't supported yet.
38
+
39
+ From python:
40
+
41
+ ```
42
+ # using python instead of git clone because I know this supports lfs on the DLAMI image
43
+ from huggingface_hub import Repository
44
+ repo = Repository(local_dir="Mistral-neuron", clone_from="aws-neuron/Mistral-neuron")
45
+
46
+ ```
47
+
48
+ This should put a local copy in Mistral-neuron. This process should take a 5-10 minutes. If it completes in a few seconds the first time you run it, you are having problems with git-lfs. You can see this by using ls -al to check the size of the files downloaded. You will also notice it later when you get parsing errors.
49
+
50
+ Next, load the model and neff files from disk into the Neuron processors:
51
+
52
+ ```
53
+ import torch
54
+ from transformers_neuronx import constants
55
+ from transformers_neuronx.mistral.model import MistralForSampling
56
+ from transformers_neuronx.module import save_pretrained_split
57
+ from transformers_neuronx.config import NeuronConfig
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer
59
+
60
+ # Set sharding strategy for GQA to be shard over heads
61
+ neuron_config = NeuronConfig(
62
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
63
+ )
64
+ # define the model. These are the settings used in compilation.
65
+ # If you want to change these settings, skip to "Compilation of other Mistral versions"
66
+ model_neuron = MistralForSampling.from_pretrained("Mistral-neuron", batch_size=1, tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
67
+
68
+ # load the neff files from the local directory instead of compiling
69
+ model_neuron.load("Mistral-neuron")
70
+
71
+ # load the neff files into the neuron processors.
72
+ # you can see this process happening if you run neuron-top from the command line in another console.
73
+ # if you didn't do the previous load command, this will also compile the neff files
74
+ model_neuron.to_neuron()
75
+
76
+
77
+ ```
78
+
79
+ ## Inference example
80
+
81
+ This points to the original model for the tokenizer because the tokenizer is the same.
82
+ If you are compiling your own and want to have a single reference for everything, you can copy the special_tokens_map.json and tokenizer* from the original model to your local copy.
83
+
84
+ ```
85
+ # Get a tokenizer and example input. Note that this points to the original model
86
+ tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1')
87
+ text = "[INST] What is your favourite condiment? [/INST]"
88
+ encoded_input = tokenizer(text, return_tensors='pt')
89
+
90
+ # Run inference
91
+ with torch.inference_mode():
92
+ generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None)
93
+ print([tokenizer.decode(tok) for tok in generated_sequence])
94
+
95
+ ```
96
+
97
+
98
+ Example output:
99
+ (most of the time with amp=‘bf16’, the answer is ketchup. However, if I compiled with amp=f32, the answer was soy sauce. This was for a sample size of one, so let me know what you see —@jburtoft)
100
+
101
+ ```
102
+ 2024-Jan-03 15:59:21.0510 1486:2057 [0] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
103
+ 2024-Jan-03 15:59:21.0510 1486:2057 [0] init.cc:138 CCOM WARN OFI plugin initNet() failed is EFA enabled?
104
+ ['<s> [INST] What is your favourite condiment? [/INST] My favorite condiment is probably ketchup. It adds a perfect balance of sweet, tangy, and slightly spicy flavor to dishes, and is versatile enough to go with a wide variety of foods.</s>']
105
+
106
+ ```
107
+
108
+ ## Compilation of other Mistral versions
109
+
110
+ If you want to use a different version of Mistral from Hugging Face, use the slightly modified code below. It essentially removes the “load” command. When the “to_neuron()” command sees that the model object doesn’t include the neff files, it will kick off the recompile. You can save them at the end so you only have to do the compilation process once. After that, you can use the code above to load a model and the neff files from the local directory.
111
+
112
+ ```
113
+ import torch
114
+ from transformers_neuronx import constants
115
+ from transformers_neuronx.mistral.model import MistralForSampling
116
+ from transformers_neuronx.module import save_pretrained_split
117
+ from transformers_neuronx.config import NeuronConfig
118
+ from transformers import AutoModelForCausalLM, AutoTokenizer
119
+
120
+ # Load and save the CPU model with bfloat16 casting. This also gives us a local copy
121
+ # change the Hugging Face model name (mistralai/Mistral-7B-Instruct-v0.1) below to what you want
122
+ # You can update the other model names if you want, but they just reference a directory on the local disk.
123
+ model_cpu = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1')
124
+ save_pretrained_split(model_cpu, 'mistralai/Mistral-7B-Instruct-v0.1-split')
125
+
126
+ # Set sharding strategy for GQA to be shard over heads
127
+ neuron_config = NeuronConfig(
128
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
129
+ )
130
+
131
+ # Create and compile the Neuron model
132
+ model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1-split', batch_size=1, \
133
+ tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
134
+ model_neuron.to_neuron()
135
+
136
+ #save compiled neff files out to the same directory
137
+ model_neuron.save("mistralai/Mistral-7B-Instruct-v0.1-split")
138
+
139
+
140
+ ```
141
+
142
+
143
+
144
+ ## Arguments passed during compilation
145
+
146
+ The settings use in compilation are the same as shown above in the code. If you want to change these, you will need to recompile. If you don’t want to pass them in each time, you could update the config.json file. This is another nice thing the Hugging Face optimum framework does for us. You can see an example of the format by looking at one of the Llama model config.json files. For [example](https://huggingface.co/aws-neuron/Llama-2-7b-hf-neuron-latency/blob/main/config.json).
147
+
148
+ ```
149
+ neuron_config = NeuronConfig(
150
+ grouped_query_attention=constants.GQA.SHARD_OVER_HEADS
151
+ )
152
+ ("Mistral-neuron", batch_size=1, tp_degree=2, n_positions=256, amp='bf16', neuron_config=neuron_config)
153
+
154
+ ```
155
+
156
+
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "HuggingFaceH4/zephyr-7b-beta",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 2,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 10000.0,
21
+ "sliding_window": 4096,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.37.0.dev0",
25
+ "use_cache": true,
26
+ "vocab_size": 32000
27
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.37.0.dev0"
6
+ }
pytorch_model.bin/key_to_filename.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:825d20f4a18183eff3963e805edd13ef7eb35b0aff7a850e8153ca1eeeb37970
3
+ size 26397
pytorch_model.bin/p0.model.embed_tokens.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee6b750e276c8934e2f08711947208c153642156e989cdaa9ff94964ad59f526
3
+ size 524288789
pytorch_model.bin/p1.model.layers.0.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41a25c74acdb6afdf2e44d2e635f8c7c5ffc6940f17bfeab257c211cf38fb6f8
3
+ size 67109756
pytorch_model.bin/p10.model.layers.1.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78c512af070bc05461e79ec7d95b7f11da5ae52126fc5a224887f9965e345002
3
+ size 67109759
pytorch_model.bin/p100.model.layers.11.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d89c863526008bf58b148964be7b7b3c585a306ec72ce627f3846ea4d401aef
3
+ size 67109765
pytorch_model.bin/p101.model.layers.11.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f356b5c436733ba41754dc6f60b4309b61803fb61142ed4eb229024eead79fef
3
+ size 16778117
pytorch_model.bin/p102.model.layers.11.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4c964a9bb0da31f77eab3d334c0dde9161239992d475931897cdc6635ddecf7
3
+ size 16778117
pytorch_model.bin/p103.model.layers.11.self_attn.o_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbef525c2c619eda5a766b7b7891f61b1c8683b74ce30ef6aac00ec238fc97f5
3
+ size 67109765
pytorch_model.bin/p104.model.layers.11.mlp.gate_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71586ff36e63f39e7122bbd8c7102951ff4f73faddc7877fc2fd1ca053b729e3
3
+ size 234881916
pytorch_model.bin/p105.model.layers.11.mlp.up_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06df80e17523ee97064ebd5f858ff13ac84b84fd7677de7d47df5818a4f5c88f
3
+ size 234881910
pytorch_model.bin/p106.model.layers.11.mlp.down_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1650dbbd50ac01c7e465f0891a65388a2e81b182768dfb8d0e8810bd30307575
3
+ size 234881916
pytorch_model.bin/p107.model.layers.11.input_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea832a0eee9b4984e18926b5d5c2be07727a54e92e8bc3324991b8504c853d7c
3
+ size 17282
pytorch_model.bin/p108.model.layers.11.post_attention_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07a68b709ebeb3443bca8dda593aee53f948413d39b960b49680bb83b69c9356
3
+ size 17309
pytorch_model.bin/p109.model.layers.12.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59c8740c830a8a0a863d163bed564b8d1f695edb8ad179aee62bc8de729baaa7
3
+ size 67109765
pytorch_model.bin/p11.model.layers.1.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd01403c1f0535566e89242766edcc99299362da472de82bc0cb3b08bf07dbd6
3
+ size 16778111
pytorch_model.bin/p110.model.layers.12.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92b62ecbba30bedb8ec6446ac28a7f5e70a0304c339c3a1104b0b96e16f6d941
3
+ size 16778117
pytorch_model.bin/p111.model.layers.12.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:921e254279ba9f09fc17d90d0fe4633e6b1b240f8e0acb8114fadddf7bca34fb
3
+ size 16778117
pytorch_model.bin/p112.model.layers.12.self_attn.o_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1490ae57c5b90090d320a8dd2f9b292a261063dfaa7c2bf2bb08a4adec64a1be
3
+ size 67109765
pytorch_model.bin/p113.model.layers.12.mlp.gate_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:832844270ec7cca4274055fc0be4fd785c8ca7ed9b04d0584ed2289ff8a50b23
3
+ size 234881916
pytorch_model.bin/p114.model.layers.12.mlp.up_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e463434fb23a60feef91e2f9b341f298d9fe84fb719420b45f1f95fb60cc3bee
3
+ size 234881910
pytorch_model.bin/p115.model.layers.12.mlp.down_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:717b728d1191ef251c40bc77f013441b5b2dc0fe282ebb45327dbaef25fb5ea0
3
+ size 234881916
pytorch_model.bin/p116.model.layers.12.input_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8e81021db5d86694989971d34d201043ac8c77e57cfa7500f7f28ed28aafa9e
3
+ size 17282
pytorch_model.bin/p117.model.layers.12.post_attention_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ded63f3a073116fef0510178067d13794fd770888361b3638a0ea662dd193fcb
3
+ size 17309
pytorch_model.bin/p118.model.layers.13.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:206304189aefe8ba4dcfdb2ef995e9002ed69de92762a2b5bd5c91154947a976
3
+ size 67109765
pytorch_model.bin/p119.model.layers.13.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:682b47978b5660f906bbcadeb204a594c00f66d0d4b0d695f90b3f509e0792de
3
+ size 16778117
pytorch_model.bin/p12.model.layers.1.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7cf8c5c295c62c1152497162ffc5d9b696022bccfcb8d3966cab3eadc0a3f9a
3
+ size 16778111
pytorch_model.bin/p120.model.layers.13.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b33b41bd98923da91c24b9fe11835c60cbe3a82cb12bdd35308219ef7ed8b47
3
+ size 16778117
pytorch_model.bin/p121.model.layers.13.self_attn.o_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a69f347e9327de82dbb0ec5702ecdbbd56825c56a067ffd9d233ef41fae3005
3
+ size 67109765
pytorch_model.bin/p122.model.layers.13.mlp.gate_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9739ff131b9c237f052fd4bf1b72c4e079168cfbebd4c43689f8ef9b5ccc1db7
3
+ size 234881916
pytorch_model.bin/p123.model.layers.13.mlp.up_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfe204e620a2a529fb4ab29188ba59a80586ecdb3a6b5364192752c34ad28a9f
3
+ size 234881910
pytorch_model.bin/p124.model.layers.13.mlp.down_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe46f82c58888a72f8179364253223f2c8a76e178037d7a54384ecd308a3cee1
3
+ size 234881916
pytorch_model.bin/p125.model.layers.13.input_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf9bb94f057317606cf2114071bf74059b1d7faaade75029c743b64677f6c6c4
3
+ size 17282
pytorch_model.bin/p126.model.layers.13.post_attention_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a44795f56580496eaf5d27b2064d86bd8d3b7eb95ba08bec598f1e27252c6b60
3
+ size 17309
pytorch_model.bin/p127.model.layers.14.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49a7320a59d09209d170ed15af64a2fc2ba7294ccc40a6e156c86f353508852e
3
+ size 67109765
pytorch_model.bin/p128.model.layers.14.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85a785ccbcf822505865f9f2cb5e25cb8782bb6420d91cc2dc3a36741d4d47ac
3
+ size 16778117
pytorch_model.bin/p129.model.layers.14.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:422b04cb57a8928fc503b5f1a143fb46133a0a114ad2635d1588b843f6c3fdc2
3
+ size 16778117
pytorch_model.bin/p13.model.layers.1.self_attn.o_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:611fdc301e5bdbc7fbc6ee041b7ff7f63a0df00831e5caf7473692708d34b3be
3
+ size 67109759
pytorch_model.bin/p130.model.layers.14.self_attn.o_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42e87c6d53e6a24f627ff7ed5dbbede0105cfd8826e897fa53a9a4eaf632c7b4
3
+ size 67109765
pytorch_model.bin/p131.model.layers.14.mlp.gate_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1024d6ec61910ff5cdb6e1e5821eae709a5360cda88eb32f9df676852fb5bfc
3
+ size 234881916
pytorch_model.bin/p132.model.layers.14.mlp.up_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a4f4a90146cf0ee870648382d12941483aeda9bde1c5042472f4e09f20d54a9
3
+ size 234881910
pytorch_model.bin/p133.model.layers.14.mlp.down_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e39244410a6eb8f3ec18f21286b93509ae94eb608964df4991e33f8c2587b4e1
3
+ size 234881916
pytorch_model.bin/p134.model.layers.14.input_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:784401e68cc10e15f6bc9c14d7e61e6cb1f186526539fd13eb8ba6dcfe4bc574
3
+ size 17282
pytorch_model.bin/p135.model.layers.14.post_attention_layernorm.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b1727567e2cfc3c87d9d4870408aec62a9e2531e8a6fc61e232efbdb3062d86
3
+ size 17309
pytorch_model.bin/p136.model.layers.15.self_attn.q_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5d824e4c211ed3ce6a0607963c02be8299f0bb4bfd255f91b7aa56053254541
3
+ size 67109765
pytorch_model.bin/p137.model.layers.15.self_attn.k_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c47140536e260a37a97e7584d7fdf394a6e4b6217b2baffe3da56285b5fddc5
3
+ size 16778117
pytorch_model.bin/p138.model.layers.15.self_attn.v_proj.weight ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:104255d5f7335a4ed1f5c9e5a8f91bbd8f17f9408875b0ffbd1938879af4d1a9
3
+ size 16778117