punchnox
/

Tinnyllama-1.1B-rk3588-rkllm-1.1.4

English

llama

Model card Files Files and versions Community

punchnox commited on 19 days ago

Commit

ccf92fa

verified ·

1 Parent(s): b7f995d

Update README.md

Browse files

Files changed (1) hide show

README.md +60 -67

README.md CHANGED Viewed

@@ -17,80 +17,73 @@ widget:
       Write me a function to calculate the first 10 digits of the fibonacci
       sequence in Python and print it out to the CLI.
 ---
 # TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4
-This version of TinyLlama-1.1B-Chat-v1.0 has been converted to run on the RK3588 NPU using w8a8 quantization.
-This model has been optimized with the following LoRA:
-Compatible with RKLLM version: 1.1.4
-Converted using https://github.com/c0zaut/ez-er-rkllm-toolkit
-# Original Model Card for base model, TinyLlama-1.1B-Chat-v1.0, below:
----
-license: apache-2.0
-datasets:
-- cerebras/SlimPajama-627B
-- bigcode/starcoderdata
-- HuggingFaceH4/ultrachat_200k
-- HuggingFaceH4/ultrafeedback_binarized
-language:
-- en
-widget:
-  - example_title: Fibonacci (Python)
-    messages:
-    - role: system
-      content: You are a chatbot who can help code!
-    - role: user
-      content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.
 ---
-<div align="center">
-# TinyLlama-1.1B
-</div>
-https://github.com/jzhang38/TinyLlama
-The TinyLlama project aims to **pretrain** a **1.1B Llama model on 3 trillion tokens**. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.
-We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
-#### This Model
-This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). **We follow [HF's Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)'s training recipe.** The model was " initially fine-tuned on a variant of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
-We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contain 64k prompts and model completions that are ranked by GPT-4."
-#### How to use
-You will need the transformers>=4.34
-Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) github page for more information.
-```python
-# Install transformers from source - only needed for versions <= v4.34
-# pip install git+https://github.com/huggingface/transformers.git
-# pip install accelerate
-import torch
-from transformers import pipeline
-pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
-# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
-messages = [
-    {
-        "role": "system",
-        "content": "You are a friendly chatbot who always responds in the style of a pirate",
-    },
-    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
-]
-prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
-# <|system|>
-# You are a friendly chatbot who always responds in the style of a pirate.</s>
-# <|user|>
-# How many helicopters can a human eat in one sitting?</s>
-# <|assistant|>
-# ...
-```

       Write me a function to calculate the first 10 digits of the fibonacci
       sequence in Python and print it out to the CLI.
 ---
 # TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4
+This is **TinyLlama-1.1B-Chat-v1.0**, a lightweight chat model optimized to run on the **RK3588 NPU** with **w8a8 quantization**. The model is tailored for efficient inference and high performance on edge devices, leveraging **RKLLM** (version 1.1.4).
+### Key Features
+- Optimized for **RK3588 NPU** using w8a8 quantization.
+- Compatible with **RKLLM version 1.1.4**.
+- Converted using the [ez-er-rkllm-toolkit](https://github.com/c0zaut/ez-er-rkllm-toolkit).
+### Included Datasets
+- **SlimPajama-627B** (Cerebras)
+- **Starcoder Data** (BigCode)
+- **Ultrachat_200k** (HuggingFaceH4)
+- **Ultrafeedback_binarized** (HuggingFaceH4)
+### License
+This model is released under the **Apache-2.0** license.
 ---
+## Getting Started with RKLLAMA
+Follow these steps to use **TinyLlama-1.1B-Chat-v1.0** with RKLLAMA:
+### 1. Clone the RKLLAMA Repository
+```bash
+git clone https://github.com/notpunchnox/rkllama
+cd rkllama
+```
+### 2. Install Dependencies
+Run the setup script to install all required dependencies:
+```bash
+chmod +x setup.sh
+sudo ./setup.sh
+```
+### 3. Add the Model
+Download the model and place it in the `models/` directory:
+```bash
+cd ~/RKLLAMA/models/
+curl -L -O https://huggingface.co/punchnox/TinyLlama-1.1B-Chat-v1.0-rk3588-1.1.4/blob/main/TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
+```
+### 4. Launch the RKLLAMA Server
+Start the server to enable model usage:
+```bash
+rkllama serve
+```
+### 5. Interact with the Model
+#### List Available Models
+To view all models installed in RKLLAMA:
+```bash
+rkllama list
+```
+![Image](https://github.com/NotPunchnox/rkllama/raw/main/documentation/ressources/list.png)
+#### Run the Model
+Load the model on the RK3588 NPU:
+```bash
+rkllama run TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
+```
+![Image](https://github.com/NotPunchnox/rkllama/raw/main/documentation/ressources/chat.png)
+---
+# Base model: [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)