TheBloke/Qwen-14B-Chat-GPTQ

Sep 28, 2023

Plan to release this quant?

Sep 28, 2023

My understanding is that TheBloke's scripts automatically create the repo when they start running so the fact that this exists suggests it's in a pipeline somewhere processing. Perhaps it failed and needs to re-run or perhaps it's just still baking.

TheBloke

Owner Sep 28, 2023

Yeah it failed yesterday and I've not had a chance to try it again. This one is a real pain to create. My normal processes don't work for various reasons.

I'll try again soon

Beck777

Oct 4, 2023

It seems that this is pretty decent model, worth finishing it.

huberto

Oct 25, 2023

Can't wait!

sliptech

Oct 26, 2023

also giving this a bump :)

tastypear

Oct 30, 2023

Can't wait for the SUPERHOT version😋

TheBloke

Owner Oct 30, 2023

Sorry this has taken so long. Multiple GPTQs are being made now and will be uploading in the next 1 - 2 hours

Neman

Oct 30, 2023

•

edited Oct 30, 2023

Can't load. I enabled trust_remote_code, got the error ImportError: This modeling file requires the following packages that were not found in your environment: transformers_stream_generator. Run pip install transformers_stream_generator
pip installed it, but still get errors I don't know how to tackle:
Traceback (most recent call last):

File "/home/neman/text-generation-webui/modules/ui_model_menu.py", line 206, in load_model_wrapper

shared.model, shared.tokenizer = load_model(shared.model_name, loader)

File "/home/neman/text-generation-webui/modules/models.py", line 92, in load_model

tokenizer = load_tokenizer(model_name, model)

File "/home/neman/text-generation-webui/modules/models.py", line 111, in load_tokenizer

tokenizer = AutoTokenizer.from_pretrained(

File "/home/neman/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained

return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

File "/home/neman/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained

return cls._from_pretrained(

File "/home/neman/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained

tokenizer = cls(*init_inputs, **init_kwargs)

File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/tokenization_qwen.py", line 75, in init

self.mergeable_ranks = _load_tiktoken_bpe(vocab_file) # type: Dict[bytes, int]

File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/tokenization_qwen.py", line 49, in _load_tiktoken_bpe

with open(tiktoken_bpe_file, "rb") as f:

TypeError: expected str, bytes or os.PathLike object, not NoneType

TheBloke

Owner Oct 30, 2023

Sorry my bad, missed a file. Please re-download the branch to get missing file qwen.tiktoken and try again

TheBloke

Owner Oct 30, 2023

Actually there might be another problem as well, hang on

TheBloke

Owner Oct 30, 2023

OK please re-download the branch to get added qwen.tiktoken and fixed config.json. It's now working for me locally using Transformers:

*** Generate:
Tell me about AI
AI is a field of computer science that aims to create intelligent machines that work and react like humans. It involves developing algorithms and techniques that can be used to analyze data, make decisions, and perform tasks without explicit instructions.
There are several different approaches to AI, including:

  * Rule-based systems: These systems use pre-defined rules to make decisions based on input data.
  * Machine learning: This approach uses statistical models to learn from data and improve over time.
  * Deep learning: A subfield of machine learning that uses neural networks with many layers to learn from complex data.

AI has numerous applications in various fields such as healthcare, finance, transportation, entertainment, and more. Some examples include image recognition, speech recognition, natural language processing, autonomous vehicles, and medical diagnosis.
However, it's important to note that AI also raises ethical concerns regarding privacy, bias, accountability, and transparency. As AI becomes more prevalent, it is crucial for researchers, policymakers, and the general public to consider these issues and develop appropriate guidelines and regulations to ensure that AI is developed and deployed responsibly.<|endoftext|>

Neman

Oct 31, 2023

I downloaded qwen.tiktoken and replaced config.json. Can't get it to load using Transformers.
ValueError: Trying to set a tensor of shape torch.Size([1280, 15360]) in "qweight" (which has shape torch.Size([640, 15360])), this look incorrect.

It did load with AutoGPTQ, but output is gibberish. For reference, I use TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True version.
*** Generate:
Tell me about AI
then The最新 l最新 then i

o the and reform

is An 最 & mi f、“最新 then
k A (

a then c w r最新 right has his
simply &
l � season In then b
then

In any case, thank you for fast response, support and overall contribution to the community.

TheBloke

Owner Oct 31, 2023

Which branch are you testing?

Neman

Oct 31, 2023

TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True version.

TheBloke

Owner Oct 31, 2023

Works fine for me:

 [pytorch2] tomj@MC:/workspace ᐅ CUDA_VISIBLE_DEVICES=7 python3 test_transgptq.py /workspace/process/qwen_qwen-14b-chat/gptq/gptq-8bit-32g-actorder_True
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日前已经开始使用Qwen-7B，千万注意不要使用错误代码和模型。
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.82s/it]


*** Generate:
Tell me about AI
AI, or artificial intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI systems are designed to learn from experience, adapt to new inputs, and make decisions based on statistical analysis.

There are several different approaches to building AI systems, including rule-based systems, machine learning, and deep learning. Rule-based systems use pre-defined rules to make decisions, while machine learning involves training a system to recognize patterns in data and make predictions based on those patterns. Deep learning is a type of machine learning that uses neural networks with many layers to analyze complex data sets.

AI has the potential to revolutionize many industries, from healthcare and finance to transportation and entertainment. It has already been used to improve medical diagnosis, develop self-driving cars, and create personalized recommendations for consumers. However, there are also concerns about the impact of AI on jobs and society, as well as ethical considerations around issues like bias and privacy.<|endoftext|>
*** Pipeline:
The model 'QWenLMHeadModel' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PersimmonForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
Tell me about AI
AI stands for Artificial Intelligence, which refers to the ability of a computer or machine to perform tasks that would typically require human intelligence, such as learning, reasoning, problem-solving, and decision-making.

AI can be categorized into two main types: narrow or weak AI and general or strong AI. Narrow AI is designed to perform a specific task or set of tasks within a limited domain, such as facial recognition or language translation. General AI, on the other hand, has the capability to learn and perform any intellectual task that a human can do.

There are various approaches to developing AI systems, including rule-based systems, decision trees, neural networks, and deep learning. These techniques use algorithms and mathematical models to simulate human thought processes and enable machines to learn from data and make decisions based on patterns and trends.

AI has numerous applications across various industries, including healthcare, finance, transportation, and manufacturing. Some examples of AI technologies include chatbots, autonomous vehicles, predictive analytics, and image recognition.

However, there are also concerns about the impact of AI on society, including job displacement, bias in decision-making, and ethical considerations around privacy and security. As AI continues to evolve and become more prevalent, it will be important to address these issues and ensure that AI is developed and used responsibly and ethically.

Code:

import argparse
parser = argparse.ArgumentParser(description='Process and upload quantisations')
parser.add_argument('model_dir', type=str, help='model dir')
args = parser.parse_args()

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = args.model_dir
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=True,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=True)

prompt = "Tell me about AI"
prompt_template=f'''{prompt}
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

Please show the code you're trying.

Neman

Oct 31, 2023

Ah, OK, you are using transformers pipeline directly in code and I tried in text-generation-webui. Must be something with textgen.
I will try in python and report here.

Neman

Oct 31, 2023

I tried and get an error at model loading:
Exception has occurred: ValueError
Trying to set a tensor of shape torch.Size([1280, 15360]) in "qweight" (which has shape torch.Size([640, 15360])), this look incorrect.
File "/home/neman/PROGRAMMING/PYTHON/CroTranscribe/qwen_chat_14b_test.py", line 6, in
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
ValueError: Trying to set a tensor of shape torch.Size([1280, 15360]) in "qweight" (which has shape torch.Size([640, 15360])), this look incorrect.

Here is the code:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = '/mnt/disk2/LLM_MODELS/models/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True'
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=True)

TheBloke

Owner Oct 31, 2023

Trigger a re-download of the branch - you might have an incompletely downloaded file.

Neman

Oct 31, 2023

Well, I thought that was the problem (I redownloaded the branch) when it loaded the shards, but now it broke in modeling_qwen.py
Exception has occurred: RuntimeError
Unrecognized tensor type ID: AutocastCUDA
File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/modeling_qwen.py", line 467, in forward
mixed_x_layer = self.c_attn(hidden_states)
File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/modeling_qwen.py", line 654, in forward
attn_outputs = self.attn(
File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/modeling_qwen.py", line 951, in forward
outputs = block(
File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/modeling_qwen.py", line 1121, in forward
transformer_outputs = self.transformer(
File "/home/neman/.cache/huggingface/modules/transformers_modules/TheBloke_Qwen-14B-Chat-GPTQ_gptq-8bit-32g-actorder_True/modeling_qwen.py", line 1337, in generate
return super().generate(
File "/home/neman/PROGRAMMING/PYTHON/CroTranscribe/qwen_chat_14b_test.py", line 21, in
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
RuntimeError: Unrecognized tensor type ID: AutocastCUDA

So, the same branch(gptq-8bit-32g-actorder_True) works for you? Strange...
I'll try fiddling with it some more later. Will report if I succeed to document for others if they encounter same issues.
Thanks!

TheBloke

Owner Oct 31, 2023

•

edited Oct 31, 2023

Oh, AutoCastCUDA is a different problem. That happens when you are using PyTorch 2.1, which is not supported by the AutoGPTQ pre-built wheels. So almost certainly it would be working for you now if you had PyTorch 2.0.1

Steps to fix are:

Downgrade to PyTorch 2.0.1, or
Clone and build AutoGPTQ 0.4.2 from source, or
Clone and build latest AutoGPTQ 0.5 from source, or
Wait a day or two until the release of AutoGPTQ 0.5, which will have pre-built wheels for PyTorch 2.1.

Neman

Oct 31, 2023

•

edited Oct 31, 2023

You are golden! I have env with PyTorch 2.0.1 already and yes, it works :)

*** Generate:
Write a praise for Tom AKA TheBloke for being such a helpful guy.
Tom, you are the best! You always take the time to answer my questions and provide me with valuable advice. Your expertise in LLMs is truly impressive, and I am grateful for all the help you have given me.
Thank you for being such a kind and generous person. Your willingness to share your knowledge and support others is truly commendable, and it has made a big difference in my life.
I feel lucky to know you, Tom, and I hope to continue learning from you in the future. Keep up the great work!
Sincerely,
Neman<|endoftext|>

TheBloke
/

Qwen-14B-Chat-GPTQ

Will it come?