DeepSeek Coder is not based on Llama 2

#2
by Chester111 - opened

Hi, this is developer of DeepSeek Coder.

Thanks for you great work on quantizing our model and making it more popular!

Though, we noticed the model cards state that "As this model is based on Llama 2, it is also subject to the Meta Llama 2 license terms, and the license files for that are additionally included. It should therefore be considered as being claimed to be licensed under both licenses," which is not true.

Our model is not based on Llama 2. It just shares the model architecture with Llama 2 (with slightly different hyper-parameters) so it is compatible with toolchains in the Llama ecology, but the training data and model parameters are in no way related to Llama2. We collected training data on our own and trained the model from scratch. Thus, the released model is subject to our own license, not under both our license and Llama2 license.

@Chester111 Thanks for clearifying it!

@TheBloke Would you please edit this correction? Thanks!

HI @Chester111 - sorry about that!

I have added deepseek as a new model type in my code and re-run all the repos to remove the LLama 2 license files and all mentions of Llama 2. All repos should now be fixed

I was doing these uploads at 3am this morning, having missed the release on the day and being inundated with requests to do this model, so I didn't look at the models closely enough.

Thanks very much for the amazing new models!

deleted

This may not be the place and should be in the source project instead, and if so i apologize, but was going to try to convert those models over to GGUF to use on a smaller non-gpu device for comparing python output with wizard. But the tokenizer.model is missing in the source repositories. Is this a side effect of the difference? Is there a way to generate that myself, or am i outta luck? ( still new at this game.. sorry )

@Nurb432 I'm uploading GGUFs now, check back in 10 - 15 minutes

deleted

@TheBloke great. thank you.

Still curious what was needed to get past the error i got, or if im doing it wrong ( the normal convert.py from llama.ccp )

You're not doing anything wrong. The normal convert.py can't handle these models, as DeepSeek did not provide a tokenizer.model file, only a Hugging Face tokenizer tokenizer.json

In order to convert these models I needed to use a new PR for convert.py which can make GGUFs using the Hugging Face vocab configuration, from a tokenizer.json file. Although even that didn't work at first, due to bugs in the llama.cpp PR. Those bugs were fixed this morning, enabling me to make them.

The PR is here: https://github.com/ggerganov/llama.cpp/pull/3633

deleted

cool. i figured i was missing a step on how to create that file. I was using yesterdays llama.ccp and didnt even notice there was a newer one ( it was in my mail, i do watch the repo but missed it ) will bump it up when i get home later.

And again, thanks for all you do for us.

The PR I mentioned hasn't been merged yet, so you won't find the file there. It's still in development. If you want to try making one yourself, you'll need to download the code from the PR I listed, not the main branch

Hopefully it'll be merged sometime in the coming week and then it'll be available to everyone as convert.py

Sign up or log in to comment