How can the model be commercial, if it is using OpenAI CLIP?

#1
by dchichkov - opened

I've looked at the repository at https://huggingface.co/fireworks-ai/FireLLaVA-13b and its using LlavaForConditionalGeneration. I understand that the CLIP encoder that you've used is "clip_vision_model" as per your config. Which translates to "openai/clip-vit-base-patch32". And as per the model card at: https://huggingface.co/openai/clip-vit-base-patch32 this is a research/non-commercial model.

I understand that as per convention, any checkpoints that used research/non-commercial models in the pipeline are also considered to be non-commercial. And a combination of models that include non-commercial parts are also non-commercial.

Please, can you explain how your model checkpoint can be commercial or deployed for commercial/API use, while complying with the CLIP license? Or correct the claim of the model being commercial?

Also, in general, it is good practice to include rough composition of the datasets that constitute the model advertised for commercial use. Otherwise is not clear, if that data is "clean enough" for the model to be considered commercial. A quick query to your model reveals Vicuna data for example.
Screenshot from 2024-01-19 09-50-32.png

Fireworks AI org

Hi Dmitry, thank you for testing out the model and raising the concern!
The underlying vision encoder being used is from https://huggingface.co/openai/clip-vit-large-patch14-336, while the page itself does not contain any information regarding license, we believe CLIP itself is under MIT license (https://github.com/openai/CLIP/blob/main/LICENSE). Models such as SDXL are similarly using the TextEncoder from CLIP if I understand correctly.

As for the composition of data, we briefly mentioned it in our separate blog post, but here is a more through list: https://github.com/haotian-liu/LLaVA#train
The only difference we made, is swapping out the GPT generated portion with our own.

Many of the underlying images in COCO used in the visual instruction finetuning stage are non-commercial.

It's unclear from "mixed from the permissive portion of the original LLaVA training data and Fireworks.ai generated training data" whether these were in fact removed or not. Do you have an explicit list of underlying images used?

Thanks

@websterbei The MIT license is for the code, no? Model's (weights) license is in the Model Card, as far as I understand:
https://github.com/openai/CLIP/blob/main/model-card.md

And it's: "any deployed use case of the model - whether commercial or not - is currently out of scope". Not legally binding?

Any chance for a cpu-only version?

Sign up or log in to comment