avar6/Nemotron3.3 · Does this actually work?

Dec 13, 2024

So Nemotron was supposed to give "more helpful" responses compared to Llama 3.1, and Llama 3.3 is supposed to be smarter than Llama 3.1.

I tried a 2-bit quantization of Nemotron (3.1) and was quite impressed.

Having tried it, does your Nemotron 3.3 actually seem to be smarter than Nemotron 3.1 and more helpful than Llama 3.3?
Do you think nVidia will release their own "Nemotron 3.3", applying their original technique to Llama 3.3?

Have you tried using the non-Instruct version of Llama 3.3 instead of the Instruct version, as one of the inputs? It's just that from what I can tell, it seems you're effectively applying the "Instruct" vector twice, once as part of the Nemotron input, and once as part of the Llama 3.3 input, if that makes sense.

avar6

Owner Dec 19, 2024

•

edited Dec 19, 2024

Oh i wasnt expecting anyone to pay attention to this. I've never done this before I was just experimenting based on guidence from the drummer's discord. I wasnt able to quant it into a gguf cause it said I was missing a file that i could not find and hugginface kept erroring out so I gave up.

The drummer's new versions of Nautilus on the beaverAI page are made with llama3.3. I'd suggest trying those instead

barryd

Dec 25, 2024

Oh, you didn't manage to quant it?
Then are you aware of these quants?
https://huggingface.co/mradermacher/Nemotron3.3-GGUF
https://huggingface.co/mradermacher/Nemotron3.3-i1-GGUF
That's your merge. mradermacher quantized it.

avar6

Owner Dec 26, 2024

Oh, no I wasn't aware. I didn't think to ask anyone to quant it. But now that it is I will try it 🙂

avar6

Owner Dec 26, 2024

Though for the record, people were arguing about whether this merge method would work. I was just copying someone's instructions. It was supposed to "subtract" llama3.1 and add 3.3. It appears per the card, I merged 3.1, 3.3, and nemotron