ShreyasBrill/Vicuna-13B · Unfiltered model?

Apr 4, 2023

When you said "filtered", you mean that your model is trained by a dataset without the moralizing nonsence in it?
If that's the case, can you provide us the safetensor quantized model? I want to try it on the oobabooga webui :D

ShreyasBrill

Owner Apr 5, 2023

Well yeah, and this model is quantized so u can use it right away

ShreyasBrill

Owner Apr 5, 2023

its quantized to 4bit

panayao

Apr 5, 2023

@ShreyasBrill is there a way to convert this to pth format? I am relatively new to this, but am trying to do this on my M1 Max MacBook using LLaMA_MPS as it runs on Apple Silicon faster than oobabooga webui. Will try with that as well thought :)

TheYuriLover

Apr 5, 2023

@ShreyasBrill no, I mean I'd want the format to be quantized in the safetensor format so that I can use it here: https://github.com/oobabooga/text-generation-webui

ShreyasBrill

Owner Apr 5, 2023

@panayao yes you can convert it back to pth format! you can download the vicuna model and download/clone this repository and use this convert-ggml-to-pth.py file https://github.com/ggerganov/llama.cpp/blob/master/convert-ggml-to-pth.py easy and simple! and also it does work with M1 macs i guess. This version is currently not very stable enough because the official vicuna released like 2 days ago and you know its just starting up. I might update the model when an update is released so that i make the models more stable with their responses.

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover Hmmm i don't know how to do that. You can download the model convert it to .pth format and then use some other library to quantize it to safetensor format!
again you can also use https://github.com/ggerganov/llama.cpp/blob/master/convert-ggml-to-pth.py to convert the model to a different format!

TheYuriLover

Apr 5, 2023

@ShreyasBrill nah you have to use this https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/triton to convert the HF models into safetensors, but it requires a lot of vram and unfortunately I don't have a big enough graphic card to do it :'(

ShreyasBrill

Owner Apr 5, 2023

I created vicuna using only my CPU. I don't have a GPU either :(

TheYuriLover

Apr 5, 2023

@ShreyasBrill Do you have the original 16-bit HF model though? You should also upload it so people can quantize it on the safetensor format.

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover ill check it, if i have it ill upload it. let me see

TheYuriLover

Apr 5, 2023

@ShreyasBrill thanks man, I appreciate ! :D

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover Wait!!!! i guess i am quantizing the model into safetensor.. Ill upload it and message you here once its done. You can use it in the oogabooga webui

TheYuriLover

Apr 5, 2023

@ShreyasBrill if you quantize it into safetensor do it with both versions, cuda and triton, and use all the implementations aswell (true sequential + act_order + groupsize 128)
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/triton
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda

If you can just convert one of them, use triton then, it gives the fastest output speed!

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover its currently getting quantized to vicuna-13b-GPTQ-4bit-128g now

TheYuriLover

Apr 5, 2023

@ShreyasBrill You used cuda or triton convertion? I hope it's triton because for the moment we don't know how to make the cuda model run on the webui
Did you add the other implementations? you wrote vicuna-13b-GPTQ-4bit-128g , but does it have true_sequential and act_order?

CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors llama7b-4bit-128g.safetensors

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover Ah this took forever dude check the safetensors folder and download the vicuna model from the folder then while starting up the webui use these flags and it will start up "--wbits 4 --groupsize 128 --model_type llama"

TheYuriLover

Apr 5, 2023

@ShreyasBrill Thanks dude!! We really appreciate :D

But you didn't answer my questions from before, can you please respond to these?

"You used cuda or triton convertion? I hope it's triton because for the moment we don't know how to make the cuda model run on the webui
Did you add the other implementations? you wrote vicuna-13b-GPTQ-4bit-128g , but does it have true_sequential and act_order?

CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors llama7b-4bit-128g.safetensors"

TheYuriLover

Apr 5, 2023

•

edited Apr 5, 2023

@ShreyasBrill Are you trolling us or something? Your safetensors file is exactly the same as anon8231489123's one (which is trained with the filtered dataset)
https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover I am not trolling, as i said i had no gpu to make a safetensor and i asked my friend to make it. He gave me a file after so long and i uploaded it here. He also told to use those flags that i gave you. I didn't really know that someone had made it.

ShreyasBrill

Owner Apr 5, 2023

and as i also said that vicuna isnt really stable enough. For confirmation watch this guy's video and see how it performs
https://youtu.be/jb4r1CL2tcc

TheYuriLover

Apr 5, 2023

@ShreyasBrill In your description it says that it used the unfiltered dataset
https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

The safetensor file from anon8231489123 used the filtered dataset

It shouldn't be the same safetensor file at the end, why did you say you used the unfiltered dataset? Why lying like that?

ShreyasBrill

Owner Apr 5, 2023

lol forgot to remove it hold on actually first i was another model and uploaded it. Then i deleted it and reuploaded a filtered model and forgot to change the tags. Sorry for that

ShreyasBrill

Owner Apr 5, 2023

fixed it :)

TheYuriLover

Apr 5, 2023

This comment has been hidden

ShreyasBrill

Owner Apr 5, 2023

@TheYuriLover Please tell me what do you exactly need? unfiltered model with 4bit quantization and should work with oogabooga?

TheYuriLover

Apr 5, 2023

@ShreyasBrill yes, I want a model that is trained on the unfiltered dataset
https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

And that model should be quantized with Triton and with all the GPTQ implementations (true sequential, act order and groupsize 128)
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/triton

ShreyasBrill

Owner Apr 5, 2023

Okay ill try to make it and upload it.

panayao

Apr 5, 2023

@TheYuriLover do you know how to convert the 4-bit quantized LLaMA.cpp GGML format (ggml-model-q4_0.bin) to the file format like "consolidated.00.pth"? Like the original format LLaMA comes in? Or links to anything I need to learn/understand to figure out? I am trying to use https://github.com/jankais3r/LLaMA_MPS on my M1 Mac and it converts from consolidated.00.pth format to pyarrow. I tried to use a script called "convert-ggml-to-pth.py" but it does not work as expected. If I knew the differences in .pth, .bin, safetensors, and how these all inter-relate to one another it would be much easier to figure all this out lol. Any references/pointers are much appreciated :)

panayao

Apr 5, 2023

•

edited Apr 5, 2023

@panayao yes you can convert it back to pth format! you can download the vicuna model and download/clone this repository and use this convert-ggml-to-pth.py file https://github.com/ggerganov/llama.cpp/blob/master/convert-ggml-to-pth.py easy and simple! and also it does work with M1 macs i guess. This version is currently not very stable enough because the official vicuna released like 2 days ago and you know its just starting up. I might update the model when an update is released so that i make the models more stable with their responses.

This is the script I was trying lol. It doesn't seem to work. I get this when I try to convert to pyarrow format (which is used by LLaMA_MPS and is automatically done upon running with *.pth files)

Converting checkpoint to pyarrow format
models/13B_Vicuna/consolidated.00.pth
Traceback (most recent call last):
  File "/Users/panayao/Documents/LLaMA_MPS/chat.py", line 146, in <module>
    fire.Fire(main)
  File "/Users/panayao/Documents/LLaMA_MPS/env/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/Users/panayao/Documents/LLaMA_MPS/env/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/Users/panayao/Documents/LLaMA_MPS/env/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/Users/panayao/Documents/LLaMA_MPS/chat.py", line 106, in main
    generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size)
  File "/Users/panayao/Documents/LLaMA_MPS/chat.py", line 49, in load
    tens = pa.Tensor.from_numpy(v.numpy())
AttributeError: 'dict' object has no attribute 'numpy'

TheYuriLover

Apr 5, 2023

•

edited Apr 5, 2023

@panayao I guess you have to ask them directly to fix your error: https://github.com/ggerganov/llama.cpp/issues

panayao

Apr 5, 2023

Thanks @TheYuriLover lol I probably should have thought to open a Git issue. @ShreyasBrill if you oblige @TheYuriLover 's request can you also upload the unfiltered model in the same "ggml-model-q4_0.bin" format in addition to the format @TheYuriLover requested?

panayao

Apr 5, 2023

Also @TheYuriLover @ShreyasBrill I have crypto mining machines running Windows and with proper CUDA drivers that I haven't used in years but they each have 8 AMD RX470 (4Gb). So if you need stuff converted and are willing to point me in the right direction I can help :).

TheYuriLover

Apr 5, 2023

•

edited Apr 5, 2023

@panayao well for the moment the 16 bit unrestricted Vicuna doesn't exist yet, so there's nothing to quantize but if it does you can help yeah :p

ShreyasBrill

Owner Apr 6, 2023

@panayao yes i do need your gpus for help to create the models but first hear out for the next release of vicuna because its not fully stable currently. And as yuri said 16 bit unrestricted vicuna doesnt exist

ShreyasBrill

Owner Apr 6, 2023

and sorry i didnt reply because of different timezones. I was sleeping when you guys messaged me

nucleardiffusion

Apr 10, 2023

•

edited Apr 10, 2023

I would also love an unfiltered version that is also quantised in 4bit just like this version, it's a great model,
Saying that, the outputs seem to be identicle - as in - token for token --- to the vicuna-13b-q4, I am not sure this is your work?

ShreyasBrill

Owner Apr 10, 2023

@nucleardiffusion Thanks :)

nucleardiffusion

Apr 10, 2023

I edited my first comment after some tests could you please clarify?

ShreyasBrill

Owner Apr 11, 2023

@nucleardiffusion yes its the same model. I downloaded from one of my friends who trained the model, later on he deleted it thats why i thought of uploading it here. This is not my work.