Does this actually work?

#1
by barryd - opened

So Nemotron was supposed to give "more helpful" responses compared to Llama 3.1, and Llama 3.3 is supposed to be smarter than Llama 3.1.

I tried a 2-bit quantization of Nemotron (3.1) and was quite impressed.

Having tried it, does your Nemotron 3.3 actually seem to be smarter than Nemotron 3.1 and more helpful than Llama 3.3?
Do you think nVidia will release their own "Nemotron 3.3", applying their original technique to Llama 3.3?

Have you tried using the non-Instruct version of Llama 3.3 instead of the Instruct version, as one of the inputs? It's just that from what I can tell, it seems you're effectively applying the "Instruct" vector twice, once as part of the Nemotron input, and once as part of the Llama 3.3 input, if that makes sense.

Oh i wasnt expecting anyone to pay attention to this. I've never done this before I was just experimenting based on guidence from the drummer's discord. I wasnt able to quant it into a gguf cause it said I was missing a file that i could not find and hugginface kept erroring out so I gave up.

The drummer's new versions of Nautilus on the beaverAI page are made with llama3.3. I'd suggest trying those instead

Oh, you didn't manage to quant it?
Then are you aware of these quants?
https://huggingface.co/mradermacher/Nemotron3.3-GGUF
https://huggingface.co/mradermacher/Nemotron3.3-i1-GGUF
That's your merge. mradermacher quantized it.

Oh, no I wasn't aware. I didn't think to ask anyone to quant it. But now that it is I will try it πŸ™‚

Though for the record, people were arguing about whether this merge method would work. I was just copying someone's instructions. It was supposed to "subtract" llama3.1 and add 3.3. It appears per the card, I merged 3.1, 3.3, and nemotron

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment