Orca may violate to OpenAI's Terms of Use

#9
by icoxfog417 - opened

According to the Orca paper (Orca, Orca2), Orca utilizes responses from ChatGPT 3.5 and GPT-4 to train the model. However, the Terms of use of Open AI explicitly prohibit the "Use Output to develop models that compete with OpenAI.". I am curios about how Microsoft Research team interpreted OpenAI's policy and obtained approval to release competitive models. I am understanding this models are research purpose only but I think authors should mention about the concern when applyting this method in actual use.

icoxfog417 changed discussion title from Conflict to OpenAI Terms of use to Orca may violate to OpenAI's Terms of Use

7b model can't compete with OpenAI

I understood OpenAI forbids to use GPT-4/ChatGPT responses to train the LLM. Model size does not matter.

it is not competing, since they are partners.

Microsoft and OpenAI established partnership but it relates to infrastructure and does not include model licenses term.

https://openai.com/blog/openai-and-microsoft-extend-partnership

In addition to it, most of companies or individuals that use this open source model do not conclude the partnership with OpenAI. It means this model has risk to violate OpenAI term of service but it is not mentioned in any place.

"Use Output to develop models that compete with OpenAI." Is highly generic wording that can be interpreted several ways. Was the assistance from GPT-4 that I utilized to assist me in developing a python script that can merge two llms preserving the majority of their features through spherical linear interpolation a violation because I used that code to combine several LLMs into ones potentially more formidable than their original models a violation? Was Stanford's Alpaca dataset that was created using chatgpt to assemble a fine-tune dataset to instruct tune Llama a violation? Who knows? And everyone does it. So much so there's a guide on the goofiest site to see this kind of guide, wikihow. And if that isn't wild as it is, x.ai's GROK clearly used OpenAI to generate synthetic data as its error message when prompted to produce malware code says the default "as an llm by OpenAI, I can't do that" dialogue. Which is darkly hilarious because Elon the man himself hates OpenAI. OAI even noticed this trend with Grok and thought it was funny. So. The answer is - who knows.

https://arstechnica.com/information-technology/2023/12/elon-musks-ai-bot-grok-speaks-as-if-made-by-openai-in-some-tests-causing-a-stir/

In this thread we are only concerned with microsoft/Orca-2-7b. And the main question is "Is it allowed to publish the OSS model trained by OpenAI answers explicitly".
Open source models are free however OpenAI API is paid, so I think Orca, that has almost same accuracy with ChatGPT 3.5, explicity "compete" with OpenAI.

Sign up or log in to comment