alejandrovil commited on
Commit
806948a
1 Parent(s): c1357c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -120
README.md CHANGED
@@ -1,120 +1,99 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - 4-bit
5
- - AWQ
6
- - text-generation
7
- - autotrain_compatible
8
- - endpoints_compatible
9
- - Llama-3
10
- - instruct
11
- - finetune
12
- - chatml
13
- - DPO
14
- - RLHF
15
- - gpt4
16
- - synthetic data
17
- - distillation
18
- - function calling
19
- - json mode
20
- - axolotl
21
- model-index:
22
- - name: Hermes-2-Pro-Llama-3-8B
23
- results: []
24
- license: apache-2.0
25
- language:
26
- - en
27
- datasets:
28
- - teknium/OpenHermes-2.5
29
- widget:
30
- - example_title: Hermes 2 Pro
31
- messages:
32
- - role: system
33
- content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.
34
- - role: user
35
- content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
36
- pipeline_tag: text-generation
37
- inference: false
38
- quantized_by: Suparious
39
- ---
40
- # NousResearch/Hermes-2-Pro-Llama-3-8B AWQ
41
-
42
- - Model creator: [NousResearch](https://huggingface.co/NousResearch)
43
- - Original model: [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
44
-
45
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png)
46
-
47
- ## Model Description
48
-
49
- Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
50
-
51
- This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation.
52
-
53
- Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below.
54
-
55
- This version of Hermes 2 Pro adds several tokens to assist with agentic capabilities in parsing while streaming tokens - `<tools>`, `<tool_call>`, `<tool_response>` and their closing tags are single tokens now.
56
-
57
- This work was a collaboration between Nous Research, @interstellarninja, and Fireworks.AI
58
-
59
- Learn more about the function calling system for this model on our github repo here: https://github.com/NousResearch/Hermes-Function-Calling
60
-
61
- ## How to use
62
-
63
- ### Install the necessary packages
64
-
65
- ```bash
66
- pip install --upgrade autoawq autoawq-kernels
67
- ```
68
-
69
- ### Example Python code
70
-
71
- ```python
72
- from awq import AutoAWQForCausalLM
73
- from transformers import AutoTokenizer, TextStreamer
74
-
75
- model_path = "solidrust/Hermes-2-Pro-Llama-3-8B-AWQ"
76
- system_message = "You are Hermes-2-Pro-Llama-3-8B, incarnated as a powerful AI. You were created by NousResearch."
77
-
78
- # Load model
79
- model = AutoAWQForCausalLM.from_quantized(model_path,
80
- fuse_layers=True)
81
- tokenizer = AutoTokenizer.from_pretrained(model_path,
82
- trust_remote_code=True)
83
- streamer = TextStreamer(tokenizer,
84
- skip_prompt=True,
85
- skip_special_tokens=True)
86
-
87
- # Convert prompt to tokens
88
- prompt_template = """\
89
- <|im_start|>system
90
- {system_message}<|im_end|>
91
- <|im_start|>user
92
- {prompt}<|im_end|>
93
- <|im_start|>assistant"""
94
-
95
- prompt = "You're standing on the surface of the Earth. "\
96
- "You walk one mile south, one mile west and one mile north. "\
97
- "You end up exactly where you started. Where are you?"
98
-
99
- tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
100
- return_tensors='pt').input_ids.cuda()
101
-
102
- # Generate output
103
- generation_output = model.generate(tokens,
104
- streamer=streamer,
105
- max_new_tokens=512)
106
- ```
107
-
108
- ### About AWQ
109
-
110
- AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
111
-
112
- AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
113
-
114
- It is supported by:
115
-
116
- - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
117
- - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
118
- - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
119
- - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
120
- - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - 4-bit
5
+ - AWQ
6
+ - text-generation
7
+ - autotrain_compatible
8
+ - endpoints_compatible
9
+ - Llama-3
10
+ - instruct
11
+ - finetune
12
+ - chatml
13
+ - DPO
14
+ - RLHF
15
+ - gpt4
16
+ - synthetic data
17
+ - distillation
18
+ - function calling
19
+ - json mode
20
+ - axolotl
21
+ model-index:
22
+ - name: Hermes-2-Pro-Llama-3-8B
23
+ results: []
24
+ license: apache-2.0
25
+ language:
26
+ - en
27
+ datasets:
28
+ - teknium/OpenHermes-2.5
29
+ widget:
30
+ - example_title: Hermes 2 Pro
31
+ messages:
32
+ - role: system
33
+ content: You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.
34
+ - role: user
35
+ content: Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
36
+ pipeline_tag: text-generation
37
+ inference: false
38
+ quantized_by: Suparious
39
+ ---
40
+ # NousResearch/Hermes-2-Pro-Llama-3-8B AWQ
41
+
42
+ - Original model: [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
43
+
44
+ ```bash
45
+ pip install --upgrade autoawq autoawq-kernels
46
+ ```
47
+
48
+ ### Example Python code
49
+
50
+ ```python
51
+ from awq import AutoAWQForCausalLM
52
+ from transformers import AutoTokenizer, TextStreamer
53
+
54
+ model_path = "solidrust/Hermes-2-Pro-Llama-3-8B-AWQ"
55
+ system_message = "You are Hermes-2-Pro-Llama-3-8B, incarnated as a powerful AI. You were created by NousResearch."
56
+
57
+ # Load model
58
+ model = AutoAWQForCausalLM.from_quantized(model_path,
59
+ fuse_layers=True)
60
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
61
+ trust_remote_code=True)
62
+ streamer = TextStreamer(tokenizer,
63
+ skip_prompt=True,
64
+ skip_special_tokens=True)
65
+
66
+ # Convert prompt to tokens
67
+ prompt_template = """\
68
+ <|im_start|>system
69
+ {system_message}<|im_end|>
70
+ <|im_start|>user
71
+ {prompt}<|im_end|>
72
+ <|im_start|>assistant"""
73
+
74
+ prompt = "You're standing on the surface of the Earth. "\
75
+ "You walk one mile south, one mile west and one mile north. "\
76
+ "You end up exactly where you started. Where are you?"
77
+
78
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
79
+ return_tensors='pt').input_ids.cuda()
80
+
81
+ # Generate output
82
+ generation_output = model.generate(tokens,
83
+ streamer=streamer,
84
+ max_new_tokens=512)
85
+ ```
86
+
87
+ ### About AWQ
88
+
89
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
90
+
91
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
92
+
93
+ It is supported by:
94
+
95
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
96
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
97
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
98
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
99
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code