punchnox commited on
Commit
ccf92fa
·
verified ·
1 Parent(s): b7f995d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -67
README.md CHANGED
@@ -17,80 +17,73 @@ widget:
17
  Write me a function to calculate the first 10 digits of the fibonacci
18
  sequence in Python and print it out to the CLI.
19
  ---
 
20
  # TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4
21
 
22
- This version of TinyLlama-1.1B-Chat-v1.0 has been converted to run on the RK3588 NPU using w8a8 quantization.
23
 
24
- This model has been optimized with the following LoRA:
 
 
 
25
 
26
- Compatible with RKLLM version: 1.1.4
 
 
 
 
27
 
28
- Converted using https://github.com/c0zaut/ez-er-rkllm-toolkit
 
29
 
30
- # Original Model Card for base model, TinyLlama-1.1B-Chat-v1.0, below:
31
- ---
32
- license: apache-2.0
33
- datasets:
34
- - cerebras/SlimPajama-627B
35
- - bigcode/starcoderdata
36
- - HuggingFaceH4/ultrachat_200k
37
- - HuggingFaceH4/ultrafeedback_binarized
38
- language:
39
- - en
40
- widget:
41
- - example_title: Fibonacci (Python)
42
- messages:
43
- - role: system
44
- content: You are a chatbot who can help code!
45
- - role: user
46
- content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.
47
  ---
48
- <div align="center">
49
 
50
- # TinyLlama-1.1B
51
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
- https://github.com/jzhang38/TinyLlama
54
-
55
- The TinyLlama project aims to **pretrain** a **1.1B Llama model on 3 trillion tokens**. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.
56
-
57
-
58
- We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
59
-
60
- #### This Model
61
- This is the chat model finetuned on top of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T). **We follow [HF's Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)'s training recipe.** The model was " initially fine-tuned on a variant of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
62
- We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contain 64k prompts and model completions that are ranked by GPT-4."
63
-
64
-
65
- #### How to use
66
- You will need the transformers>=4.34
67
- Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) github page for more information.
68
-
69
- ```python
70
- # Install transformers from source - only needed for versions <= v4.34
71
- # pip install git+https://github.com/huggingface/transformers.git
72
- # pip install accelerate
73
-
74
- import torch
75
- from transformers import pipeline
76
-
77
- pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
78
 
79
- # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
80
- messages = [
81
- {
82
- "role": "system",
83
- "content": "You are a friendly chatbot who always responds in the style of a pirate",
84
- },
85
- {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
86
- ]
87
- prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
88
- outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
89
- print(outputs[0]["generated_text"])
90
- # <|system|>
91
- # You are a friendly chatbot who always responds in the style of a pirate.</s>
92
- # <|user|>
93
- # How many helicopters can a human eat in one sitting?</s>
94
- # <|assistant|>
95
- # ...
96
- ```
 
17
  Write me a function to calculate the first 10 digits of the fibonacci
18
  sequence in Python and print it out to the CLI.
19
  ---
20
+
21
  # TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4
22
 
23
+ This is **TinyLlama-1.1B-Chat-v1.0**, a lightweight chat model optimized to run on the **RK3588 NPU** with **w8a8 quantization**. The model is tailored for efficient inference and high performance on edge devices, leveraging **RKLLM** (version 1.1.4).
24
 
25
+ ### Key Features
26
+ - Optimized for **RK3588 NPU** using w8a8 quantization.
27
+ - Compatible with **RKLLM version 1.1.4**.
28
+ - Converted using the [ez-er-rkllm-toolkit](https://github.com/c0zaut/ez-er-rkllm-toolkit).
29
 
30
+ ### Included Datasets
31
+ - **SlimPajama-627B** (Cerebras)
32
+ - **Starcoder Data** (BigCode)
33
+ - **Ultrachat_200k** (HuggingFaceH4)
34
+ - **Ultrafeedback_binarized** (HuggingFaceH4)
35
 
36
+ ### License
37
+ This model is released under the **Apache-2.0** license.
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ---
 
40
 
41
+ ## Getting Started with RKLLAMA
42
+
43
+ Follow these steps to use **TinyLlama-1.1B-Chat-v1.0** with RKLLAMA:
44
+
45
+ ### 1. Clone the RKLLAMA Repository
46
+ ```bash
47
+ git clone https://github.com/notpunchnox/rkllama
48
+ cd rkllama
49
+ ```
50
+
51
+ ### 2. Install Dependencies
52
+ Run the setup script to install all required dependencies:
53
+ ```bash
54
+ chmod +x setup.sh
55
+ sudo ./setup.sh
56
+ ```
57
+
58
+ ### 3. Add the Model
59
+ Download the model and place it in the `models/` directory:
60
+ ```bash
61
+ cd ~/RKLLAMA/models/
62
+ curl -L -O https://huggingface.co/punchnox/TinyLlama-1.1B-Chat-v1.0-rk3588-1.1.4/blob/main/TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
63
+ ```
64
+
65
+ ### 4. Launch the RKLLAMA Server
66
+ Start the server to enable model usage:
67
+ ```bash
68
+ rkllama serve
69
+ ```
70
+
71
+ ### 5. Interact with the Model
72
+
73
+ #### List Available Models
74
+ To view all models installed in RKLLAMA:
75
+ ```bash
76
+ rkllama list
77
+ ```
78
+ ![Image](https://github.com/NotPunchnox/rkllama/raw/main/documentation/ressources/list.png)
79
+
80
+ #### Run the Model
81
+ Load the model on the RK3588 NPU:
82
+ ```bash
83
+ rkllama run TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
84
+ ```
85
+ ![Image](https://github.com/NotPunchnox/rkllama/raw/main/documentation/ressources/chat.png)
86
 
87
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
+ # Base model: [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)