Update README.md

e447636 verified 6 days ago

4.16 kB

	---
	license: other
	pipeline_tag: visual-question-answering
	---

	<p align="center">
	<img src="logo_en.png" width="600"/>
	<p>

	<p align="center">
	<b><font size="6">InternLM-XComposer-2.5-OL</font></b>
	<p>

	<div align="center">

	[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)

	</div>


	InternLM-XComposer2.5-OL, a specialized generalist multimodal system for streaming video and audio interactions.

	### Import from Transformers
	To load the base LLM model using Transformers, use the following code:
	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	torch.set_grad_enabled(False)

	# init model and tokenizer
	model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
	tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
	model.tokenizer = tokenizer
	```

	To load the base audio model using MS-Swift, use the following code:
	```python
	import os
	os.environ['USE_HF'] = 'True'

	import torch
	from swift.llm import (
	get_model_tokenizer, get_template, ModelType,
	get_default_template_type, inference
	)
	from swift.utils import seed_everything

	model_type = ModelType.qwen2_audio_7b_instruct
	model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
	template_type = get_default_template_type(model_type)
	print(f'template_type: {template_type}')

	model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
	model_kwargs={'device_map': 'cuda:0'})
	model.generation_config.max_new_tokens = 256
	template = get_template(template_type, tokenizer)
	seed_everything(42)
	```


	## Quickstart

	We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md).


	<details>
	<summary>
	<b>Audio Understanding</b>
	</summary>

	```python
	import os
	os.environ['USE_HF'] = 'True'

	import torch
	from swift.llm import (
	get_model_tokenizer, get_template, ModelType,
	get_default_template_type, inference
	)
	from swift.utils import seed_everything

	model_type = ModelType.qwen2_audio_7b_instruct
	model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
	template_type = get_default_template_type(model_type)
	print(f'template_type: {template_type}')

	model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
	model_kwargs={'device_map': 'cuda:0'})
	model.generation_config.max_new_tokens = 256
	template = get_template(template_type, tokenizer)
	seed_everything(42)

	# Chinese ASR
	query = '<audio>Detect the language and recognize the speech.'
	response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3')
	print(f'query: {query}')
	print(f'response: {response}')
	```

	</details>


	<details>
	<summary>
	<b>Image Understanding</b>
	</summary>

	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	torch.set_grad_enabled(False)

	# init model and tokenizer
	model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
	tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
	model.tokenizer = tokenizer

	query = 'Analyze the given image in a detail manner'
	image = ['examples/images/dubai.png']
	with torch.autocast(device_type='cuda', dtype=torch.float16):
	response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
	print(response)
	```

	</details>

	### Open Source License
	The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.