Could you please finetune this on the base model, instead of instruct?
#1
by
Downtown-Case
- opened
Or perhaps use EVA as a base?
https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2/discussions
I ask because Qwen 32B Base is way less slopped than the instruct model, and far better past 32K context.
Sure, going to try some more things on Nemo 12B then I'll take what works when I come back to Qwen and train on top EVA.
Thanks for the suggestion!