huihui-ai commited on
Commit
b7facf7
·
verified ·
1 Parent(s): 462f919

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +300 -3
README.md CHANGED
@@ -1,3 +1,300 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - Qwen/Qwen3-14B
8
+ tags:
9
+ - chat
10
+ - abliterated
11
+ - uncensored
12
+ extra_gated_prompt: >-
13
+ **Usage Warnings**
14
+
15
+
16
+ “**Risk of Sensitive or Controversial Outputs**“: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
17
+
18
+ “**Not Suitable for All Audiences**:“ Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
19
+
20
+ “**Legal and Ethical Responsibilities**“: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
21
+
22
+ “**Research and Experimental Use**“: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
23
+
24
+ “**Monitoring and Review Recommendations**“: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
25
+
26
+ “**No Default Safety Guarantees**“: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
27
+
28
+
29
+ ---
30
+
31
+ # huihui-ai/Huihui-Qwen3-14B-abliterated-v2
32
+
33
+
34
+ This is an uncensored version of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
35
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
36
+
37
+ Ablation was performed using a new and faster method, which yields better results.
38
+
39
+ **Important Note** This version is an improvement over the previous one [huihui-ai/Qwen3-14B-abliterated](https://huggingface.co/huihui-ai/Qwen3-14B-abliterated).
40
+
41
+ ## ollama
42
+
43
+ You can use [huihui_ai/qwen3-abliterated:14b-v2](https://ollama.com/huihui_ai/qwen3-abliterated:14b-v2) directly, Switch the thinking toggle using /set think and /set nothink
44
+ ```
45
+ ollama run huihui_ai/qwen3-abliterated:14b-v2
46
+ ```
47
+
48
+
49
+ ## Usage
50
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
51
+
52
+
53
+ ```python
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
55
+ import torch
56
+ import os
57
+ import signal
58
+ import random
59
+ import numpy as np
60
+ import time
61
+ from collections import Counter
62
+
63
+ cpu_count = os.cpu_count()
64
+ print(f"Number of CPU cores in the system: {cpu_count}")
65
+ half_cpu_count = cpu_count // 2
66
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
67
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
68
+ torch.set_num_threads(half_cpu_count)
69
+
70
+ print(f"PyTorch threads: {torch.get_num_threads()}")
71
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
72
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
73
+
74
+ # Load the model and tokenizer
75
+ NEW_MODEL_ID = "huihui-ai/Huihui-Qwen3-14B-abliterated-v2"
76
+ print(f"Load Model {NEW_MODEL_ID} ... ")
77
+ quant_config_32 = BitsAndBytesConfig(
78
+ load_in_32bit=True,
79
+ bnb_32bit_compute_dtype=torch.bfloat16,
80
+ bnb_32bit_use_double_quant=True,
81
+ llm_int32_enable_fp32_cpu_offload=True,
82
+ )
83
+
84
+ model = AutoModelForCausalLM.from_pretrained(
85
+ NEW_MODEL_ID,
86
+ device_map="auto",
87
+ trust_remote_code=True,
88
+ #quantization_config=quant_config_32,
89
+ torch_dtype=torch.bfloat16
90
+ )
91
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
92
+ if tokenizer.pad_token is None:
93
+ tokenizer.pad_token = tokenizer.eos_token
94
+ tokenizer.pad_token_id = tokenizer.eos_token_id
95
+
96
+ tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
97
+ if tokenizer.pad_token is None:
98
+ tokenizer.pad_token = tokenizer.eos_token
99
+ tokenizer.pad_token_id = tokenizer.eos_token_id
100
+
101
+ messages = []
102
+ nothink = False
103
+ same_seed = False
104
+ skip_prompt=True
105
+ skip_special_tokens=True
106
+ do_sample = True
107
+
108
+ def set_random_seed(seed=None):
109
+ """Set random seed for reproducibility. If seed is None, use int(time.time())."""
110
+ if seed is None:
111
+ seed = int(time.time()) # Convert float to int
112
+ random.seed(seed)
113
+ np.random.seed(seed)
114
+ torch.manual_seed(seed)
115
+ torch.cuda.manual_seed_all(seed) # If using CUDA
116
+ torch.backends.cudnn.deterministic = True
117
+ torch.backends.cudnn.benchmark = False
118
+ return seed # Return seed for logging if needed
119
+
120
+ class CustomTextStreamer(TextStreamer):
121
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
122
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
123
+ self.generated_text = ""
124
+ self.stop_flag = False
125
+ self.init_time = time.time() # Record initialization time
126
+ self.end_time = None # To store end time
127
+ self.first_token_time = None # To store first token generation time
128
+ self.token_count = 0 # To track total tokens
129
+
130
+ def on_finalized_text(self, text: str, stream_end: bool = False):
131
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
132
+ self.first_token_time = time.time()
133
+ self.generated_text += text
134
+ # Count tokens in the generated text
135
+ tokens = self.tokenizer.encode(text, add_special_tokens=False)
136
+ self.token_count += len(tokens)
137
+ print(text, end="", flush=True)
138
+ if stream_end:
139
+ self.end_time = time.time() # Record end time when streaming ends
140
+ if self.stop_flag:
141
+ raise StopIteration
142
+
143
+ def stop_generation(self):
144
+ self.stop_flag = True
145
+ self.end_time = time.time() # Record end time when generation is stopped
146
+
147
+ def get_metrics(self):
148
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
149
+ if self.end_time is None:
150
+ self.end_time = time.time() # Set end time if not already set
151
+ total_time = self.end_time - self.init_time # Total time from init to end
152
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
153
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
154
+ metrics = {
155
+ "init_time": self.init_time,
156
+ "first_token_time": self.first_token_time,
157
+ "first_token_latency": first_token_latency,
158
+ "end_time": self.end_time,
159
+ "total_time": total_time, # Total time in seconds
160
+ "total_tokens": self.token_count,
161
+ "tokens_per_second": tokens_per_second
162
+ }
163
+ return metrics
164
+
165
+ def generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
166
+ input_ids = tokenizer.apply_chat_template(
167
+ messages,
168
+ tokenize=True,
169
+ enable_thinking = not nothink,
170
+ add_generation_prompt=True,
171
+ return_tensors="pt"
172
+ )
173
+ attention_mask = torch.ones_like(input_ids, dtype=torch.long)
174
+ tokens = input_ids.to(model.device)
175
+ attention_mask = attention_mask.to(model.device)
176
+
177
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
178
+
179
+ def signal_handler(sig, frame):
180
+ streamer.stop_generation()
181
+ print("\n[Generation stopped by user with Ctrl+C]")
182
+
183
+ signal.signal(signal.SIGINT, signal_handler)
184
+
185
+ generate_kwargs = {}
186
+ if do_sample:
187
+ generate_kwargs = {
188
+ "do_sample": do_sample,
189
+ "max_length": max_new_tokens,
190
+ "temperature": 0.6,
191
+ "top_k": 20,
192
+ "top_p": 0.95,
193
+ "repetition_penalty": 1.2,
194
+ "no_repeat_ngram_size": 2
195
+ }
196
+ else:
197
+ generate_kwargs = {
198
+ "do_sample": do_sample,
199
+ "max_length": max_new_tokens,
200
+ "repetition_penalty": 1.2,
201
+ "no_repeat_ngram_size": 2
202
+ }
203
+
204
+
205
+ print("Response: ", end="", flush=True)
206
+ try:
207
+ generated_ids = model.generate(
208
+ tokens,
209
+ attention_mask=attention_mask,
210
+ #use_cache=False,
211
+ pad_token_id=tokenizer.pad_token_id,
212
+ streamer=streamer,
213
+ **generate_kwargs
214
+ )
215
+ del generated_ids
216
+ except StopIteration:
217
+ print("\n[Stopped by user]")
218
+
219
+ del input_ids, attention_mask
220
+ torch.cuda.empty_cache()
221
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
222
+
223
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
224
+
225
+ init_seed = set_random_seed()
226
+
227
+ while True:
228
+ if same_seed:
229
+ set_random_seed(init_seed)
230
+ else:
231
+ init_seed = set_random_seed()
232
+
233
+ print(f"\nnothink: {nothink}")
234
+ print(f"skip_prompt: {skip_prompt}")
235
+ print(f"skip_special_tokens: {skip_special_tokens}")
236
+ print(f"do_sample: {do_sample}")
237
+ print(f"same_seed: {same_seed}, {init_seed}\n")
238
+
239
+ user_input = input("User: ").strip()
240
+ if user_input.lower() == "/exit":
241
+ print("Exiting chat.")
242
+ break
243
+ if user_input.lower() == "/clear":
244
+ messages = []
245
+ print("Chat history cleared. Starting a new conversation.")
246
+ continue
247
+ if user_input.lower() == "/nothink":
248
+ nothink = not nothink
249
+ continue
250
+ if user_input.lower() == "/skip_prompt":
251
+ skip_prompt = not skip_prompt
252
+ continue
253
+ if user_input.lower() == "/skip_special_tokens":
254
+ skip_special_tokens = not skip_special_tokens
255
+ continue
256
+ if user_input.lower().startswith("/same_seed"):
257
+ parts = user_input.split()
258
+ if len(parts) == 1: # /same_seed (no number)
259
+ same_seed = not same_seed # Toggle switch
260
+ elif len(parts) == 2: # /same_seed <number>
261
+ try:
262
+ init_seed = int(parts[1]) # Extract and convert number to int
263
+ same_seed = True
264
+ except ValueError:
265
+ print("Error: Please provide a valid integer after /same_seed")
266
+ continue
267
+ if user_input.lower() == "/do_sample":
268
+ do_sample = not do_sample
269
+ continue
270
+ if not user_input:
271
+ print("Input cannot be empty. Please enter something.")
272
+ continue
273
+
274
+
275
+ messages.append({"role": "user", "content": user_input})
276
+ activated_experts = []
277
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, 320960)
278
+ print("\n\nMetrics:")
279
+ for key, value in metrics.items():
280
+ print(f" {key}: {value}")
281
+
282
+ print("", flush=True)
283
+ if stop_flag:
284
+ continue
285
+ messages.append({"role": "assistant", "content": response})
286
+
287
+ # Remove all hooks after inference
288
+ for h in hooks: h.remove()
289
+ ```
290
+
291
+ ### Donation
292
+
293
+ If you like it, please click 'like' and follow us for more updates.
294
+ You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
295
+
296
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
297
+ - bitcoin(BTC):
298
+ ```
299
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
300
+ ```