llama-compatibility

#11
by ehartford - opened

Hello Yi team:

Thank you for the excellent model.

It's come to our understanding that Yi uses exactly Llama's architecture except for two tensors renamed. (input_layernorm, post_attention_layernorm)

https://github.com/turboderp/exllamav2/commit/6d24e1ad40d89f64b1bd3ae36e639c74c9f730b2

Because there is a lot of investment and tooling around the llama architecture, there is value in using the same names for the tensors.

The open source community will surely republish Yi with the tensors renamed in order to have a version that conforms to llama architecture.

We hope that you would consider adopting this change into your official model, at this point in time before the model has gained significant adoption, so that it can ultimately enjoy the adoption it deserves.

That sounds rude.

Weird. It's not.

deleted

They’ll probably keep turning a blind eye. You gotta get used to such disappointment.

If they utilize the exact Meta LLaMA structure, its codebase, and all related resources, adherence to the licensing agreements set forth by LLaMA is also required. Requesting the official release of Yi models in LLaMA format is problematic as it undermines the enforceability of Yi's licensing terms.

When I say it is rude, I am referring to the lack of respect for the intent of Yi's License. Many actions can be taken by the open-source community, but not by commercial entities.

I didn't say any of that.
I said change the name of the tensors to match.

https://www.diffchecker.com/bJTqkvmQ/

Maybe this helps for your discussion ;-)

https://www.diffchecker.com/bJTqkvmQ/

Maybe this helps for your discussion ;-)

So... is this even a llama compatible model?

It seems to work run a simple rename in exllama: https://github.com/turboderp/exllamav2/commit/6d24e1ad40d89f64b1bd3ae36e639c74c9f730b2

But it may be working incorrectly for all I know.

EDIT: Actually isn't that diff file just a bunch of refactoring? Whats the actual difference with llama's architecture?

Purely copy and rename some of the tensors.... what a shame....

There's nothing wrong with llama architecture

The training is everything

This comment has been hidden

Yes please. This will make existing training tools work better. You get more tunes and wider adoption. It's only 2 names.

@brucethemoose There is a little refactoring in the commit, but there's also a change to rename the keys for the RMSNorm module: "input_layernorm" becomes "ln1" and "post_attention_layernorm" becomes "ln2" if the model's config file identifies the model as "YiForCausalLM". Aside from that there are no changes to the architecture that I can tell.

The tokenizer model is new, but that is also the case for e.g. OpenLlama. Here the vocabulary is also twice as large, but in any case it's a SentencePiece model and it loads without issue.

Whether there's actually more going on is hard to say. The architecture is laid out in modeling_yi.py and it looks like Llama aside from the changes above, but I may still be overlooking something.

richardllin changed discussion status to closed
richardllin changed discussion status to open

Hello @ehartford

Thanks a lot for pointing this out in the discussion. Really appreciate your sharp eye and patience while waiting for our response.

You're right about the tensor names. We're going to rename them from Yi to Llama, just like you suggested. It’s important to us that we’re accurate and transparent about this stuff.

You mentioned in your post above, “The open source community will surely republish Yi with the tensors renamed in order to have a version that conforms to llama architecture.” So, we'd like to know: would you like to submit a pull request with these changes? Alternatively, if you'd prefer us to handle the update, we can do that and release a new version in the same repo – this might be quicker.

This naming issue was an oversight on our part. During extensive training experiments, we made several renamings in the code to meet experimental requirements. But, we kinda dropped the ball and didn’t switch them back before pushing out the release. Our bad on that, and we're sorry for the confusion.

We're on it to tighten up our process, so this kind of slip-up doesn’t happen again. Your feedback’s been a huge help. We'll also be reviewing all our code again to make sure everything else is in order. Any extra eyes from you and the community would be greatly appreciated.

Once again, thank you for your input, and we look forward to your continued support and suggestions.

Sincerely,
Richard Lin
Open Source Director, Yi Team

Awesome thank you for the response.
@chargoddard would you like to?

We're on it to tighten up our process, so this kind of slip-up doesn’t happen again.

Hey Richard don't beat yourself up about it -- it's a minor thing and easily fixed and your response here it first rate! Thanks so much for engaging with the open source community. :D

I am concerned about the mixed use of licenses (for code, as well as model) caused by completely using the same format as LLaMA, and then using code from Meta for model inference, and the validity of your binding terms.

If I understand correctly, you may need to release a new License update to correctly use the code from Meta directly for model inference at the official status, thus many details need to be reconsidered.
This seems to mean that you need to cede more rights to the community.

The llama license applies to the trained llama weights. Not to the architecture which is openly published.

Think of it like an API design such as OpenAI's API - many applications expose an API compatible with OpenAI's API. That doesn't make them subject to OpenAI ToS or license.

Same with the topology of a neural network it's not copyrightable and @ylecun would never do that anyway.

The llama license applies to the trained llama weights. Not to the architecture which is openly published.

Think of it like an API design such as OpenAI's API - many applications expose an API compatible with OpenAI's API. That doesn't make them subject to OpenAI ToS or license.

Same with the topology of a neural network it's not copyrightable and @ylecun would never do that anyway.

No, if you can read the license , its code is also part of LLaMA Materials. After Yi modifies its architecture to be the same as LLaMA's, it inevitably means using LLaMA's code for model inference. This appears to be entirely different from the situation you have subjectively speculated.

And for Yi:

If my understanding of the LLaMA license is correct, if Yi uses LLaMA's model structure officially, meaning the same inference code (originating from LLaMA), then according to the LLaMA license, there is a clause: "You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof)." If Yi's model is not considered as a derivative work of LLaMA, then this clearly constitutes a license conflict, as the inference part of the code and the model definition are part of the "Llama Materials."

This means that if you release Yi in the exact format and inference method as LLaMA:

  1. Your model would be considered a derivative work of LLaMA, necessitating a revision of your terms regarding commercial usage to include the stipulation about applying to Meta for permission if the monthly active users exceed 700 million.
  2. As a derivative work of LLaMA, any derivative works or modifications made by third parties, according to LLaMA's license, would mean the ownership rights lie with the modifying party. Your clause stating "The ownership of the Yi Series Models and their related intellectual property rights is solely held by the Licensor" would no longer be valid, as the intellectual property rights of the model would be co-owned by Meta, Yi, and any third-party modifiers.

In simple terms, after resolving the license conflict, the community could gain a model with dual commercial usage restrictions, but with intellectual property owned by the modifier of the model. This would be beneficial for the community.

Please reconsider this aspect. Of course, personally, I would appreciate and support your decision to transfer more intellectual property rights to the community.

I would like to reiterate that how the community chooses to proceed, in the absence of intellectual property and commercial constraints, is at the discretion of community enthusiasts. However, if Yi were to proceed in this manner, based on the current licensing terms, it would be in conflict with the LLaMA license - unless Yi explicitly states that Yi's model is a derivative work of the LLaMA model. This would imply a transfer of certain rights.

I believe that any educated individual, upon reading both the LLaMA and Yi licenses, would not find it difficult to reach this conclusion.

Who said anything about code?

You are hallucinating

Who said anything about code?

You are hallucinating

The file modeling_llama.py from transformers, which defines the structure of LLaMA models, is also a part of LLaMA model, just as the current modeling_yi.py of Yi.
This specific file that you use for inferencing Llama2 and all its derivative models, is part of LLaMA Materials according to the license of LLaMA.

If you meant that you use no code from Meta to run LLaMAs, and the structure of the models are not in "Llama Materials" that was defined in the license of LLaMA2, then it's good for you.

I understand that many people in the community do not care about the terms of licenses, but you cannot put the Yi team, who has given us such a great model, in a position of injustice and potential controversy.
You think you are speaking for the community, but in my view, this is just a disregard for the rules of the game. The reason why open-source software can remain prosperous today is largely due to the respect for open-source licenses.

I speak only for myself.

That a model can be interacted with using code that has the word "llama" in it, does not make it subject to llama's license.

That a model can be interacted with using code that has the word "llama" in it, does not make it subject to llama's license.

Your statement was selfish and arrogant.

If you have reading problems, I will try to read it to you word for word here.

"Llama 2" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. "Llama Materials" means, collectively, Meta's proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.

This is why llama's license applicable to all models inferenced as a LLaMA model.

  1. License Rights and Redistribution. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).

This is why any model that was inferenced as a LLaMA, to be specific, using from transformers import LlamaForCausalLM should be its derivative work, or is a violation of its license.

Based on the terms of the License, I agree with @JosephusCheung view that the model's source code is also part of the Llama2 license, and must comply with the same license terms. Of course, I also believe that changing one or two variable names or altering the code's simple structure is not sufficient for Yi to avoid conflicts with the license. This is precisely why Yi's team has faced widespread criticism in the Chinese media this week. Many believe their attempt to circumvent the Llama2 license restrictions in this way is unethical.
I think @ehartford reminder was technically well-intentioned, but it has also sparked the current storm of public opinion.
The Yi model is an outstanding open-source model, containing a large amount of original training techniques and datasets, which are the intellectual property of Yi company. However, resolving the license conflict is indeed a challenge.
I hope that the development of the open-source community and the demands of commercial companies can find a win-win combination in the future.

🤗 Hey all 🤗

🚨 the transformers code IS NOT subject to the Llama licence 🚨

We made sure of this when porting the model (see this PR). Thus changing the weights to the hugging face format (which involves renaming 2 layers of the checkpoint) is not going to make it subject to the licence.

This means that if you release Yi in the exact format and inference method as LLaMA:

The transformers code is not the exact format and inference of LlaMa so this does not apply.

I appreciate your concern for the community’s benefit and suggest open communication between relevant parties to address any potential conflicts or ambiguities in licensing. Clear and transparent discussions can pave the way for mutually agreeable solutions.

Let me know if you need any more assistance in navigating this situation.

This is probably a silly question but can the license change when someone quantizes a model? That technically changes some things about the model right? So what would happen to like a yi 34b quantized model? Will it still be the same license?

That a model can be interacted with using code that has the word "llama" in it, does not make it subject to llama's license.

Your statement was selfish and arrogant.

If you have reading problems, I will try to read it to you word for word here.

"Llama 2" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-libraries/llama-downloads/. "Llama Materials" means, collectively, Meta's proprietary Llama 2 and Documentation (and any portion thereof) made available under this Agreement.

This is why llama's license applicable to all models inferenced as a LLaMA model.

  1. License Rights and Redistribution. v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).

This is why any model that was inferenced as a LLaMA, to be specific, using from transformers import LlamaForCausalLM should be its derivative work, or is a violation of its license.

You are entitled to your incorrect opinion.

I can't wait for the future in which our world will be run by engineers and not by politician and lawyers.
It will be utopia.

The LLaMA 2 license applies to the model's weights rather than its architecture, as far as I understand. Yi has commendably made its model weights, worth millions, available to the public. While there could be improvements in transparency, this gesture is nonetheless quite generous.

I can't wait for the future in which our world will be run by engineers and not by politician and lawyers.
It will be utopia.

Um, then the engineers will be politicians.

deleted

Um, then the engineers will be politicians.

many already are...

Um, then the engineers will be politicians.

many already are...

You missed the point. What I'm saying is that politician is a role. Whoever plays the role is a politician. It doesn't matter who they are, whether we're in your future utopia or not, etc. If there are people who professionally engage in political activity, they are politicians. You said, "I can't wait for the future in which our world will be run by engineers and not by politician and lawyers."
If an engineer takes on the role of "running the world", they are acting as politicians. That is, they are practicing the "art or science of government" ("a person who is professionally involved in politics).

It is politic + ics. If the engineers are the ones practicing the "art and science of governance" or whatever, they are politicians. The point of my saying that was to imply that it is the role itself that makes a politician, not any particular sort of person. That if engineers ran the world, they would become like politicians because THAT IS POLITICS. It is human nature, not engineer or politician nature. That it doesn't matter who you choose, if they are the ones who are charged with the affairs of running the state (polis), they are the politicians.

Got a state? Yup.
Got citizens who run the thing? (as opposed to a king, etc.) Yup.
You got yourself some politicians!

https://www.etymonline.com/word/politics#etymonline_v_17576
politics (n.)
1520s, "science and art of government," from politic (n.) "the political state of a country or government (early 15c.), from Old French politique and Medieval Latin politica; see politic (adj.). The plural form probably was modeled on Aristotle's ta politika "affairs of state" (plural), the name of his book on governing and governments, which was in English mid-15c. as (The Book of) Polettiques or Polytykys. Also see -ics.
early 15c., politike, "pertaining to public affairs, concerning the governance of a country or people," from Old French politique "political" (14c.) and directly from Latin politicus "of citizens or the state, civil, civic," from Greek politikos "of citizens, pertaining to the state and its administration; pertaining to public life," from polites "citizen," from polis "city" (see polis)

politic (adj.)
early 15c., politike, "pertaining to public affairs, concerning the governance of a country or people," from Old French politique "political" (14c.) and directly from Latin politicus "of citizens or the state, civil, civic," from Greek politikos "of citizens, pertaining to the state and its administration; pertaining to public life," from polites "citizen," from polis "city" (see polis).

-ics
in the names of sciences or disciplines (acoustics, aerobics, economics, etc.), a 16c. revival of the classical custom of using the neuter plural of adjectives with Greek -ikos "pertaining to" (see -ic) to mean "matters relevant to" and also as the titles of treatises about them.

ehartford changed discussion status to closed

Sign up or log in to comment