--- license: other pipeline_tag: text-generation ---

InternLM-XComposer2

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) [Paper](https://arxiv.org/abs/2401.16420)
**InternLM-XComposer2** is a vision-language large model (VLLM) based on [InternLM2](https://github.com/InternLM/InternLM) for advanced text-image comprehension and composition. We release InternLM-XComposer2 series in two versions: - InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks. - InternLM-XComposer2: The finetuned VLLM for *Free-from Interleaved Text-Image Composition*. This is the 4-bit version of InternLM-XComposer2-VL ## Quickstart We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers. ```python import torch, auto_gptq from transformers import AutoModel, AutoTokenizer from auto_gptq.modeling import BaseGPTQForCausalLM auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"] torch.set_grad_enabled(False) class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM): layers_block_name = "model.layers" outside_layer_modules = [ 'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', ] inside_layer_modules = [ ["attention.wqkv.linear"], ["attention.wo.linear"], ["feed_forward.w1.linear", "feed_forward.w3.linear"], ["feed_forward.w2.linear"], ] # init model and tokenizer model = InternLMXComposer2QForCausalLM.from_quantized( 'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval() tokenizer = AutoTokenizer.from_pretrained( 'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True) text = 'Please describe this image in detail.' image = 'examples/image1.webp' with torch.cuda.amp.autocast(): response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) print(response) #The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." #The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. #They appear to be hiking or climbing, as one of them is holding a walking stick. #The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures. ``` ### Import from Transformers To load the InternLM-XComposer2-VL-7B model using Transformers, use the following code: ```python import torch from PIL import image from transformers import AutoTokenizer, AutoModelForCausalLM ckpt_path = "internlm/internlm-xcomposer2-vl-7b" tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() ``` ### 通过 Transformers 加载 通过以下的代码加载 InternLM-XComposer2-VL-7B 模型 ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM ckpt_path = "internlm/internlm-xcomposer2-vl-7b" tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() # `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足 model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() ``` ### Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.