Suparious commited on
Commit
c355ba2
1 Parent(s): d3f1157

Add model card

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md CHANGED
@@ -1,3 +1,146 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: mistralai/Mistral-7B-v0.1
3
+ tags:
4
+ - Mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - DPO
9
+ - RLHF
10
+ - gpt4
11
+ - synthetic data
12
+ - distillation
13
+ - function calling
14
+ - json mode
15
+ - quantized
16
+ - 4-bit
17
+ - AWQ
18
+ - text-generation
19
+ - autotrain_compatible
20
+ - endpoints_compatible
21
+ - chatml
22
+ model-index:
23
+ - name: Hermes-2-Pro-Mistral-7B
24
+ results: []
25
  license: apache-2.0
26
+ language:
27
+ - en
28
+ datasets:
29
+ - teknium/OpenHermes-2.5
30
+ widget:
31
+ - example_title: Hermes 2 Pro
32
+ messages:
33
+ - role: system
34
+ content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.
35
+ - role: user
36
+ content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
37
+ model_type: mistral
38
+ pipeline_tag: text-generation
39
+ inference: false
40
+ prompt_template: '<|im_start|>system
41
+
42
+ {system_message}<|im_end|>
43
+
44
+ <|im_start|>user
45
+
46
+ {prompt}<|im_end|>
47
+
48
+ <|im_start|>assistant
49
+
50
+ '
51
+ quantized_by: Suparious
52
  ---
53
+ # NousResearch/Hermes-2-Pro-Mistral-7B AWQ
54
+
55
+ **UPLOAD IN PROGRESS**
56
+
57
+ - Model creator: [NousResearch](https://huggingface.co/NousResearch)
58
+ - Original model: [Hermes-2-Pro-Mistral-7B](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)
59
+
60
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png)
61
+
62
+ ## Model Summary
63
+
64
+ Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes!
65
+
66
+ Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
67
+
68
+ This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation.
69
+
70
+ Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below.
71
+
72
+ This work was a collaboration between Nous Research, @interstellarninja, and Fireworks.AI
73
+
74
+ Learn more about the function calling system for this model on our github repo here: https://github.com/NousResearch/Hermes-Function-Calling
75
+
76
+ ## How to use
77
+
78
+ ### Install the necessary packages
79
+
80
+ ```bash
81
+ pip install --upgrade autoawq autoawq-kernels
82
+ ```
83
+
84
+ ### Example Python code
85
+
86
+ ```python
87
+ from awq import AutoAWQForCausalLM
88
+ from transformers import AutoTokenizer, TextStreamer
89
+
90
+ model_path = "solidrust/Hermes-2-Pro-Mistral-7B-AWQ"
91
+ system_message = "You are Hermes, incarnated as a powerful AI."
92
+
93
+ # Load model
94
+ model = AutoAWQForCausalLM.from_quantized(model_path,
95
+ fuse_layers=True)
96
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
97
+ trust_remote_code=True)
98
+ streamer = TextStreamer(tokenizer,
99
+ skip_prompt=True,
100
+ skip_special_tokens=True)
101
+
102
+ # Convert prompt to tokens
103
+ prompt_template = """\
104
+ <|im_start|>system
105
+ {system_message}<|im_end|>
106
+ <|im_start|>user
107
+ {prompt}<|im_end|>
108
+ <|im_start|>assistant"""
109
+
110
+ prompt = "You're standing on the surface of the Earth. "\
111
+ "You walk one mile south, one mile west and one mile north. "\
112
+ "You end up exactly where you started. Where are you?"
113
+
114
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
115
+ return_tensors='pt').input_ids.cuda()
116
+
117
+ # Generate output
118
+ generation_output = model.generate(tokens,
119
+ streamer=streamer,
120
+ max_new_tokens=512)
121
+
122
+ ```
123
+
124
+ ### About AWQ
125
+
126
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
127
+
128
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
129
+
130
+ It is supported by:
131
+
132
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
133
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
134
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
135
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
136
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
137
+
138
+ ## Prompt template: ChatML
139
+
140
+ ```plaintext
141
+ <|im_start|>system
142
+ {system_message}<|im_end|>
143
+ <|im_start|>user
144
+ {prompt}<|im_end|>
145
+ <|im_start|>assistant
146
+ ```