Workaround for the corrupted jsons.
I was running on WSL to try and avoid the encoding issues mentioned by the author, but couldn't seem to get the three config.json files to xor correctly. All the other files correctly converted and had the correct hashes, but tokenizer_config.json, config.json and generation_config.json all ended up with corrupted gibberish. After a decent bit of trial and error, I decided to try a wild shot in the dark and just used the original un-xored copies of the relevant files, and threw it into oobabooga. With manually specifying the model type as llama, it loaded up and began producing output. I'm not sure what changes were originally intended for those files, but this hopefully can help someone else get the model running.
I tried downloading on Ubuntu, MacOS, Windows after installing git lfs, and they all come out corrupt.
I don't think it's an OS issue. Please fix this.
Indeed - my bad everyone. A couple of the JSONs seem to have contained local paths to my machine so they become corrupted when XORring anywhere else, which was kind of a stupid oversight. I'll try to re-do them over the weekend.
As a temporary workaround, you can just copy-paste the JSONs from the base LLaMA HF conversion and the model will work correctly. If using Kobold and you want proper EOS behavior, just edit the config.json
to add "badwordsids": [[0]]
inside the root object.
If I copy-paste the JSONs from the base LLaMA HF conversion, would that impact quantizing in any way? In other words, do I need to requantize when you release the fix?
Thanks!
got the same issue on the macos.
can you just release JSON's as is? they should not be copyrighted I assume.
and thank you for you time!
Just pushed a commit that keeps the JSON files raw. They were originally XORred because that's how the OASST script does it - but I doubt Meta will give us any legal trouble for the JSONs.
I'll close this discussion for now, but please don't hesitate to open a new one if something's still broken.