--- license: apache-2.0 library_name: transformers pipeline_tag: visual-question-answering --- # CogVLM # Qiuckstart ```python import torch import requests from PIL import Image from transformers import AutoModelForCausalLM, LlamaTokenizer model_path = 'Model/folder/path/here' tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5') model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True ).eval() # chat example query = 'Can you provide a description of the image and include the coordinates [[x0,y0,x1,y1]] for each mentioned object?' image = Image.open("your/image/path/here").convert('RGB') inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode inputs = { 'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'), 'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'), 'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'), 'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]], } gen_kwargs = {"max_length": 2048, "do_sample": False} with torch.no_grad(): outputs = model.generate(**inputs, **gen_kwargs) outputs = outputs[:, inputs['input_ids'].shape[1]:] print(tokenizer.decode(outputs[0])) ``` # 许可(License) 此存储库中的代码是根据 [Apache-2.0 许可](https://github.com/THUDM/CogVLM/raw/main/LICENSE) 开放源码,而使用 CogVLM 模型权重必须遵循 [模型许可](https://github.com/THUDM/CogVLM/raw/main/MODEL_LICENSE)。 The code in this repository is open source under the [Apache-2.0 license](https://github.com/THUDM/CogVLM/raw/main/LICENSE), while the use of the CogVLM model weights must comply with the [Model License](https://github.com/THUDM/CogVLM/raw/main/MODEL_LICENSE). # 引用(Citation) If you find our work helpful, please consider citing the following papers ``` @article{wang2023cogvlm, title={CogVLM: Visual Expert for Pretrained Language Models}, author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang}, year={2023}, eprint={2311.03079}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```