InternVL2_5-1B-Int8

This version of InternVL2_5-1B has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.3

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/OpenGVLab/InternVL2_5-1B

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

Chips image encoder 448 ttft w8a16
AX650 350 ms 420 ms 32 tokens/sec

How to use

Download all files from this repository to the device

root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# tree -L 1
.
|-- config.json
|-- internvl2_5_1b_448_ax650
|-- internvl2_5_tokenizer
|-- internvl2_5_tokenizer_448.py
|-- main_internvl2_5_448_prefill
|-- run_internvl2_5_448_ax650.sh
`-- ssd_car.jpg

Install transformer

pip install transformers==4.41.1

Start the Tokenizer service

root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# python3 internvl2_5_tokenizer_448.py --port 12345
None None 151645 <|im_end|>
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
......
151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,

 198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
310
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
47
http://localhost:12345

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

  • input text
Describe the picture
  • input image

Open another terminal and run ./run_internvl2_5_448_ax650.sh

root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# ./run_internvl2_5_448_ax650.sh
[I][                            Init][ 127]: LLM init start
bos_id: -1, eos_id: 151645
  3% | β–ˆβ–ˆ                                |   1 /  28 [0.01s<0.14s, 200.00 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ |  28 /  28 [1.42s<1.42s, 19.66 count/s] init vpm axmodel ok,remain_cmm(2859 MB)B)
[I][                            Init][ 275]: max_token_len : 1023
[I][                            Init][ 280]: kv_cache_size : 128, kv_cache_num: 1023
[I][                            Init][ 288]: prefill_token_num : 320
[I][                            Init][ 290]: vpm_height : 448,vpm_width : 448
[I][                            Init][ 299]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> ssd_car.jpg
[I][                          Encode][ 358]: image encode time : 362.987000 ms, size : 229376
[I][                             Run][ 569]: ttft: 426.75 ms

The image depicts a scene on a city street with a prominent red double-decker bus in the background.
The bus is adorned with an advertisement that reads, "THINGS GET MORE EXCITING WHEN YOU SAY YES."
The bus is traveling on a road with a white bicycle lane marked on it. The street is lined with buildings,
and there is a black car parked on the side of the road. A woman is standing in the foreground, smiling at the camera.
She is wearing a black jacket and a scarf. The overall atmosphere suggests a typical urban setting,
possibly in a city known for its iconic double-decker buses.

[N][                             Run][ 708]: hit eos,avg 31.90 token/s

prompt >> q
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# 

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

TODO

Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for AXERA-TECH/InternVL2_5-1B

Finetuned
(3)
this model

Collection including AXERA-TECH/InternVL2_5-1B