How do you run this?

by LaferriereJC - opened Apr 20, 2023

Discussion

LaferriereJC

Apr 20, 2023

I tried text-generation-webui, and it detects llama but fails to run yet I can run

python server.py --model ggml-alpaca-7b-q4 --listen

dchest

Apr 20, 2023

It runs via this cformers fork https://github.com/antimatter15/cformers

Mozzipa

Apr 20, 2023

I tried cformer with M1 mac. But its response is only blank.
If I input "hi" on ">" , nothing appears . And ">" again.

> Hi


>

LaferriereJC

Apr 20, 2023

I tried to modify interface as such
# stablelm
'cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b': ModelUrlMap(
cpp_model_name="gptneox",
int4_fixed_zero="https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b/resolve/main/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin"),

and chat.py
model_map = {'stablelm': 'cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b', 'pythia': 'OpenAssistant/oasst-sft-1-pythia-12b', 'bloom': 'bigscience/bloom-7b1', 'gptj': 'Eleuther$

but when I attempt to run
python chat.py -m stablelm

I get an error
cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b does not appear to have a file named config.json

but none of the other models have config.json

Do you have instructions for how to set this up?

I assumed gptneox from looking at the config.json's for the stablelm's models (i.e. https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b/blob/main/config.json)

donkilluminatti

Apr 20, 2023

•

edited Apr 20, 2023

There is a Windows fork which, when started, will ask which file to select https://github.com/LostRuins/koboldcpp
Support GGML models
!!!Pardon - this model does not start

LaferriereJC

Apr 20, 2023

I see the cformers link you provided had been updated to include an option for this model.

zatochu

Apr 20, 2023

•

edited Apr 20, 2023

It should run with llama.cpp now as of this commit. https://github.com/ggerganov/llama.cpp/commit/12b5900dbc9743dee3ce83513cf5c3a44523a1b6
Edit: Maybe not?

LaferriereJC

Apr 20, 2023

My concern is it's not going to respect the stop tokens identified in
https://huggingface.co/vvsotnikov/stablelm-tuned-alpha-3b-16bit

as a result, I'm looking at the above model with deepspeed and hard coded the class into text-generation_webui's load_preset_values

The code worked with cformers, and it was fast, but it was generating a lot of run-on sentences.

winglian

Apr 24, 2023

will this work with https://github.com/ggerganov/ggml/tree/master/examples/stablelm ? Seems like this model has the incorrect n_ctx and n_embd sizes?

main: seed = 1682298523
stablelm_model_load: loading model from '../../models/cakewalk__ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin' - please wait ...
stablelm_model_load: n_vocab = 50432
stablelm_model_load: n_ctx = 6144
stablelm_model_load: n_embd = 48
stablelm_model_load: n_head = 16
stablelm_model_load: n_layer = 32
stablelm_model_load: n_rot = 1
stablelm_model_load: ftype = 2
stablelm_model_load: ggml ctx size = 75.89 MB
stablelm_model_load: memory_size = 36.00 MB, n_mem = 196608
stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file
main: failed to load model from '../../models/cakewalk__ggml-q4_0-stablelm-tuned-alpha-7b/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'
stablelm_model_load: %

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment