--- license: apache-2.0 --- ## Model Details This model is an int4 model with group_size 128 and symmetric quantization of [HuggingFaceTB/SmolVLM-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with revision="e289950" to use AutoGPTQ format. ## How To Use ### INT4 Inference ```python from auto_round import AutoRoundConfig ##must import for auto-round format import torch from PIL import Image from transformers import AutoProcessor, AutoModelForVision2Seq from transformers.image_utils import load_image DEVICE = "cuda" if torch.cuda.is_available() else "cpu" quantized_model_path = "OPEA/SmolVLM-Instruct-int4-sym-inc" # Load images image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg" content = "Describe this image." # Initialize processor and model processor = AutoProcessor.from_pretrained(quantized_model_path) model = AutoModelForVision2Seq.from_pretrained( quantized_model_path, torch_dtype="auto", device_map=DEVICE, _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager", ##revision="e289950" ##AutoGPTQ format ) # Create input messages messages = [ { "role": "user", "content": [ {"type": "image"}, {"type": "text", "text": content} ] }, ] # Prepare inputs prompt = processor.apply_chat_template(messages, add_generation_prompt=True) inputs = processor(text=prompt, images=[load_image(image_url)], return_tensors="pt") inputs = inputs.to(DEVICE) # Generate outputs generated_ids = model.generate(**inputs, max_new_tokens=500) generated_texts = processor.batch_decode( generated_ids, skip_special_tokens=True, ) print(generated_texts[0]) ##INT4: ## User:Describe this image. ## Assistant: A woman is sitting on the beach with a dog. The woman is wearing a plaid shirt and has her hair down. She is smiling and holding the dog's paw. The dog is a golden retriever and is wearing a collar. The dog is sitting on the sand. The sun is setting in the background. ##BF16: ## User:Describe this image. ## Assistant: The image depicts a sandy beach scene with a young woman and a dog sitting side by side on the sand. The woman is on the right side of the image, wearing a plaid shirt and dark pants. She has long, dark hair and is smiling. She is holding the dog's paw in her right hand. The dog is a golden retriever, and it is wearing a blue collar with a tag. The dog is sitting on its hind legs, facing the woman. The dog's fur is light brown and it has a black nose. The dog's tail is wagging, indicating a happy and friendly demeanor. ## The background of the image shows the ocean, with waves gently crashing against the shore. The sky is clear, with a gradient of light blue at the top and a darker blue at the bottom, indicating either sunrise or sunset. The sand on the beach is light brown and appears to be wet, with some footprints visible. ## The overall mood of the image is peaceful and happy, as the woman and the dog appear to be enjoying each other's company. The setting is a typical beach scene, with the natural elements of the ocean and the sand providing a serene and calming atmosphere. image_url = "http://images.cocodataset.org/train2017/000000411975.jpg" content = "How many people are there on the baseball field in the image?" ##INT4: ## User:How many people are there on the baseball field in the image? ## Assistant: There are four people on the baseball field in the image. ##BF16: ## User:How many people are there on the baseball field in the image? ## Assistant: There are four people on the baseball field in the image. image_url = "https://intelcorp.scene7.com/is/image/intelcorp/processor-overview-framed-badge:1920-1080?wid=480&hei=270" content = "This image represents which company?" ##INT4: ## User:This image represents which company? ## Assistant: Intel. ##BF16: ## User:This image represents which company? ## Assistant: Intel. ``` ### Generate the model Here is the sample command to reproduce the model. ```bash pip install auto-round auto-round-mllm \ --model HuggingFaceTB/SmolVLM-Instruct \ --device 0 \ --group_size 128 \ --bits 4 \ --iters 1000 \ --nsample 512 \ --seqlen 2048 \ --format 'auto_gptq,auto_round' \ --output_dir "./tmp_autoround" ``` ## Ethical Considerations and Limitations The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. ## Caveats and Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: - Intel Neural Compressor [link](https://github.com/intel/neural-compressor) ## Disclaimer The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. ## Cite @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} } [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)