Other lanuage ablity
Outside of training in the base model, was this trained with examples in other languages than just English? To me seems there might be a improvement in Japanese but not sure
Actually more testing it seems worse in some cases? I asked it to translate a song and it just outputted this when I corrected it and told it not to use romaji
[1]
[2]
[3]
And as sanity check I tested on Huggingchat
Translated Japanese Lyrics:
[Verse 1]
[Verse 2]
[Chorus]
[Bridge]
[Chorus]
This reminds me of what Llama 3 not to be confused with Llama 3.1 would do annoyingly but 3.1 fixed, I suspected that Japanese data was literally find and replaced in the whole training dataset for some reason. But when it doesn't do that, it's quite good maybe? I am not sure why the heck Meta thought that was a good idea, but besides the point 3.1 didn't do this anymore I thought
Well if anyone is okay with a wildly inefficient fix(?) I combined two models in a way that likely just made it dumber... Is it good? I'm not sure, these kinds of merges or really hit and miss, will try other methods too I just think doing it this way is funny and cursed. I have to quantize it to 2 bits brb and likely damage the model even more because science and also I am GPU poor. Not sure if NVIDIA is okay with me kindaish promoting my own merges, hopefully it's fine, I don't recommend using it really
https://huggingface.co/nonetrix/llama-3.1-70B-nemotron-agent-ja-120B