If you haven't tried it, do. This model is great

by nacs - opened Jun 4, 2023

Discussion

nacs

Jun 4, 2023

•

edited Jun 4, 2023

I'm really impressed at the quality of the answers and how good it is at instruction following.

I'm finding it to be as good or better than Vicuna/Wizard Vicuna/Wizard-uncensored models in almost every case.

For those of you haven't tried it, do -- its worth it.

Thanks for training/sharing this @NousResearch

[ Also: any plans for 30B? ]

se4sons

Jun 5, 2023

would be great to see if 30B improves the results

walking-octupus

Jun 5, 2023

•

edited Jun 5, 2023

I'm more interested to see someone train a 7B model on this dataset, since it is most accessible on consumer hardware and can be compared to original WizardLM.
Then, given we're approaching OpenAI's quality, I think running OpenAI's Eval on these models may too be useful, since I think it evaluates reasoning skills of these models a little better than other metrics.
I'd be curious to test the model myself on factual QA, document analysis/classification, and reasoning.

karan4d

NousResearch org Jun 5, 2023

I'm really impressed at the quality of the answers and how good it is at instruction following.

I'm finding it to be as good or better than Vicuna/Wizard Vicuna/Wizard-uncensored models in almost every case.

For those of you haven't tried it, do -- its worth it.

Thanks for training/sharing this @NousResearch

[ Also: any plans for 30B? ]

I'm more interested to see someone train a 7B model on this dataset, since it is most accessible on consumer hardware and can be compared to original WizardLM.
Then, given we're approaching OpenAI's quality, I think running OpenAI's Eval on these models may too be useful, since I think it evaluates reasoning skills of these models a little better than other metrics.
I'd be curious to test the model myself on factual QA, document analysis/classification, and reasoning.

7b and 40b are planned. might do 7b on MPT or falcon or something, not too sure yet.

walking-octupus

Jun 5, 2023

7b and 40b are planned. might do 7b on MPT or falcon or something, not too sure yet.

I wonder how good MPT can get, since their default chat fine-tune was horribly incoherent on factual open QA, just barely associating terms with the question and generating some technobabble. (eg: "Did the French Revolution grant women suffrage?").

I haven't tried Falcon yet, still waiting for it to be implemented in llama.cpp. I hope it uses similar techniques to LLaMA, since even GPT-2 large doesn't play well with my laptop, whereas 7B LLaMA GGML Q_4_1 works wonderfully.

Also, this doesn't really relate to this model in particular, but can one fine-tune a fine-tune of LLaMA? I'm currently building a closed QA notes app inspired by Google's Tailwind with LLaMA, so I wonder if it's possible to make WizardLM keep its overall knowledge while always answering in concise bullet-points. Going over each instruction in a dataset and asking GPT-3.5 to reformat it might get quite expensive quickly...

nacs

Jun 5, 2023

•

edited Jun 5, 2023

Problem with 40B models is that it won’t fit in a 24GB VRAM GPU so it excludes all consumer cards and people.

30B/33B fit in 24GB with enough context to be useful.

digitous

Jun 5, 2023

I agree with the above; 30bs can fit on more affordable consumer hardware, one RTX3090 etc after quantizing to 4bit. 40b would also be sweet at least when 3bit quantization reaches acceptable use.

But either way, thank you for your hard work! I look forward to trying this out when I get the chance. Only heard good things.

karan4d

NousResearch org Jun 5, 2023

good to know--will reconsider and happily take base model suggestions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment