|
--- |
|
license: mit |
|
tags: |
|
- rockchip |
|
- rk3588 |
|
- rkllm |
|
- text-generation-inference |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# ezrkllm-collection |
|
Collection of LLMs compatible with Rockchip's chips using their rkllm-toolkit. |
|
This repo contains the converted models for running on the RK3588 NPU found in SBCs like Orange Pi 5, NanoPi R6 and Radxa Rock 5. |
|
|
|
Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu |
|
|
|
## Available LLMs |
|
Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet). |
|
|
|
Right now, only converted the following models: |
|
| LLM | Parameters | Link | |
|
| --------------------- | ----------- | ---------------------------------------------------------- | |
|
| Qwen Chat | 1.8B | https://huggingface.co/Pelochus/qwen-1_8B-rk3588 | |
|
| Microsoft Phi-2 | 2.7B | https://huggingface.co/Pelochus/phi-2-rk3588 | |
|
| Llama 2 7B | 7B | https://huggingface.co/Pelochus/llama2-chat-7b-hf-rk3588 | |
|
| Llama 2 13B | 13B | https://huggingface.co/Pelochus/llama2-chat-13b-hf-rk3588 | |
|
| Qwen 1.5 Chat | 4B | https://huggingface.co/Pelochus/qwen1.5-chat-4B-rk3588 | |
|
| TinyLlama v1 (broken) | 1.1B | https://huggingface.co/Pelochus/tinyllama-v1-rk3588 | |
|
|
|
However, RKLLM also supports Qwen 2 (supossedly). Llama 2 was converted using Azure servers. |
|
For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max). |
|
Converting Llama 2 7B peaked at about 32 GBs of RAM + 50 GB of swap. |
|
|
|
## Downloading a model |
|
Use: |
|
|
|
`git clone LINK_FROM_PREVIOUS_TABLE_HERE` |
|
|
|
And then (may not be necessary): |
|
|
|
`git lfs pull` |
|
|
|
If the first clone gives you problems (takes too long) you can also: |
|
|
|
`GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE` |
|
|
|
And then 'git lfs pull' inside the cloned folder to download the full model. |
|
|
|
## RKLLM parameters used |
|
RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models. |
|
Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (1). |
|
All models are optimized. |
|
|
|
## Future additions |
|
- [x] Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM) |
|
- [x] Converting Qwen 1.5 (from 0.5 to 7B, except 4B, already converted) |
|
- [ ] Adding other compatible Rockchip's SoCs |
|
|
|
## More info |
|
- My fork for rknn-llm: https://github.com/Pelochus/ezrknn-llm |
|
- Original Rockchip's LLMs repo: https://github.com/airockchip/rknn-llm |