|
--- |
|
license: other |
|
pipeline_tag: visual-question-answering |
|
--- |
|
|
|
<p align="center"> |
|
<img src="logo_en.png" width="600"/> |
|
<p> |
|
|
|
<p align="center"> |
|
<b><font size="6">InternLM-XComposer-2.5-OL</font></b> |
|
<p> |
|
|
|
<div align="center"> |
|
|
|
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) |
|
|
|
</div> |
|
|
|
|
|
**InternLM-XComposer2.5-OL**, a specialized generalist multimodal system for streaming video and audio interactions. |
|
|
|
### Import from Transformers |
|
To load the base LLM model using Transformers, use the following code: |
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
torch.set_grad_enabled(False) |
|
|
|
# init model and tokenizer |
|
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() |
|
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) |
|
model.tokenizer = tokenizer |
|
``` |
|
|
|
To load the base audio model using MS-Swift, use the following code: |
|
```python |
|
import os |
|
os.environ['USE_HF'] = 'True' |
|
|
|
import torch |
|
from swift.llm import ( |
|
get_model_tokenizer, get_template, ModelType, |
|
get_default_template_type, inference |
|
) |
|
from swift.utils import seed_everything |
|
|
|
model_type = ModelType.qwen2_audio_7b_instruct |
|
model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' |
|
template_type = get_default_template_type(model_type) |
|
print(f'template_type: {template_type}') |
|
|
|
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', |
|
model_kwargs={'device_map': 'cuda:0'}) |
|
model.generation_config.max_new_tokens = 256 |
|
template = get_template(template_type, tokenizer) |
|
seed_everything(42) |
|
``` |
|
|
|
|
|
## Quickstart |
|
|
|
We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md). |
|
|
|
|
|
<details> |
|
<summary> |
|
<b>Audio Understanding</b> |
|
</summary> |
|
|
|
```python |
|
import os |
|
os.environ['USE_HF'] = 'True' |
|
|
|
import torch |
|
from swift.llm import ( |
|
get_model_tokenizer, get_template, ModelType, |
|
get_default_template_type, inference |
|
) |
|
from swift.utils import seed_everything |
|
|
|
model_type = ModelType.qwen2_audio_7b_instruct |
|
model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b' |
|
template_type = get_default_template_type(model_type) |
|
print(f'template_type: {template_type}') |
|
|
|
model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio', |
|
model_kwargs={'device_map': 'cuda:0'}) |
|
model.generation_config.max_new_tokens = 256 |
|
template = get_template(template_type, tokenizer) |
|
seed_everything(42) |
|
|
|
# Chinese ASR |
|
query = '<audio>Detect the language and recognize the speech.' |
|
response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3') |
|
print(f'query: {query}') |
|
print(f'response: {response}') |
|
``` |
|
|
|
</details> |
|
|
|
|
|
<details> |
|
<summary> |
|
<b>Image Understanding</b> |
|
</summary> |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
torch.set_grad_enabled(False) |
|
|
|
# init model and tokenizer |
|
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half() |
|
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True) |
|
model.tokenizer = tokenizer |
|
|
|
query = 'Analyze the given image in a detail manner' |
|
image = ['examples/images/dubai.png'] |
|
with torch.autocast(device_type='cuda', dtype=torch.float16): |
|
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True) |
|
print(response) |
|
``` |
|
|
|
</details> |
|
|
|
### Open Source License |
|
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn. |
|
|