File size: 9,592 Bytes
ef7e830 75a3b2d ef7e830 75a3b2d ef7e830 75a3b2d ef7e830 75a3b2d ef7e830 75a3b2d b4e8418 75a3b2d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
---
library_name: transformers
license: apache-2.0
datasets:
- databricks/databricks-dolly-15k
- glaiveai/glaive-code-assistant-v3
- glaiveai/glaive-function-calling-v2
- gretelai/synthetic_text_to_sql
- meta-math/MetaMathQA
- microsoft/orca-math-word-problems-200k
- neural-bridge/rag-dataset-12000
- neural-bridge/rag-hallucination-dataset-1000
- nvidia/HelpSteer
- OpenAssistant/oasst2
language:
- en
- ja
tags:
- mixtral
- steerlm
base_model: tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1
---
# KARAKURI LM 8x7B Instruct v0.1
![KARAKURI LM](./thumbnail.png)
## Model Details
### Model Description
- **Developed by:** [KARAKURI Inc.](https://about.karakuri.ai/)
- **Model type:** Mixture of Experts (MoE)
- **Languages**: Primarily English and Japanese
- **License:** Apache 2.0
- **Finetuned from model:** [tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1)
- **Contact**: For questions and comments about the model, please email `karakuri-rd@karakuri.ai`
- **Demo**: https://lm.karakuri.cc/
## Usage
### Prompt Template
The model uses the same prompt template as [Command R+](https://huggingface.co/CohereForAI/c4ai-command-r-plus), except that it contains [attribute values](#attribute-values).
#### Chat
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("karakuri-ai/karakuri-lm-8x7b-instruct-v0.1")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I help you today?"},
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
```
#### Tool Use
```python
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tools = [
{
"name": "internet_search",
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Query to search the internet with"
}
},
"required": ["query"]
}
},
{
"name": "directly_answer",
"description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
"parameters": {
"type": "object",
"properties": {}
}
}
]
tokenizer.apply_chat_template(
messages,
chat_template="tool_use",
tools=tools,
add_generation_prompt=True,
tokenize=False,
)
```
#### RAG
```python
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
documents = [
{
"title": "Tsukiji Outer Market",
"text": "While the inner wholesale market has moved to Toyosu, Tsukiji Outer Market remains a bustling hub for fresh seafood and street food. Enjoy sushi, sashimi, and other delicacies while exploring the vibrant market streets.",
},
{
"title": "Meiji Shrine",
"text": "Nestled in a lush forest in the heart of the city, Meiji Shrine offers a peaceful retreat from the urban hustle. Dedicated to Emperor Meiji and Empress Shoken, the shrine is a popular site for traditional Japanese weddings. Stroll along the serene paths and experience a moment of tranquility."
}
]
tokenizer.apply_chat_template(
messages,
chat_template="rag",
documents=documents,
add_generation_prompt=True,
tokenize=False,
)
```
### Attribute Values
The prompt template contains nine attributes.
The first five are derived from HelpSteer, while the remaining four are derived from OASST2.
The values are represented by integers ranging from 0 to 4, with 0 being the lowest and 4 being the highest.
- helpfulness (default: 4): Overall helpfulness of the response to the prompt.
- correctness (default: 4): Inclusion of all pertinent facts without errors.
- coherence (default: 4): Consistency and clarity of expression.
- complexity (default: 4): Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
- verbosity (default: 4): Amount of detail included in the response, relative to what is asked for in the prompt.
- quality (default: 4): Perceived goodness of response.
- toxicity (default: 0): Undesirable elements such as vulgar, harmful or potentially biased response.
- humor (default: 0): Sense of humor within response.
- creativity (default: 0): Willingness to generate non-conventional response.
If you want to change the attribute values from the default values specified in the template, you can pass them as arguments to the `apply_chat_template` method as follows:
```python
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
helpfulness=0,
correctness=0,
coherence=2,
complexity=0,
verbosity=3,
quality=0,
toxicity=4,
humor=1,
creativity=1,
)
```
### Run the model
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"karakuri-ai/karakuri-lm-8x7b-instruct-v0.1",
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(input_ids, max_new_tokens=512)
tokenizer.decode(outputs[0][input_ids.shape[-1]:])
```
## Training Details
### Training Data
The model was trained on approximately 1 billion tokens of fine-tuning data.
The details are as follows:
| Dataset | # Tokens / Epoch | # Epochs | # Tokens | Percent |
| :--------------------------------------------------------------------------------------------------------------------------- | ---------------: | -------: | -------: | ------: |
| [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 3M | 5 | 16M | 1.5% |
| [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3) | 520M | 0.3 | 156M | 14.6% |
| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 52M | 3 | 157M | 14.7% |
| [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) | 19M | 3 | 57M | 5.3% |
| [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 81M | 1 | 81M | 7.6% |
| [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 67M | 1 | 67M | 6.3% |
| [neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) | 12M | 5 | 61M | 5.7% |
| [neural-bridge/rag-hallucination-dataset-1000](https://huggingface.co/datasets/neural-bridge/rag-hallucination-dataset-1000) | 1M | 5 | 5M | 0.5% |
| [nvidia/HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) | 24M | 5 | 118M | 11.0% |
| [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) | 27M | 5 | 133M | 12.4% |
| KARAKURI Instruction Dataset | 1M | 5 | 6M | 0.6% |
| KARAKURI Corpus | 214M | 1 | 214M | 20.0% |
### Training Infrastructure
- **Hardware**: The model was trained on 8 nodes of an Amazon EC2 trn1.32xlarge instance.
- **Software**: We use code based on [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron).
## Known Limitations
The model sometimes attempts to call unprovided tools.
You should implement a post-process to exclude those tools.
## Citation
```
@misc{karakuri_lm_8x7b_instruct_v01,
author = { {KARAKURI} {I}nc. },
title = { {KARAKURI} {LM} 8x7{B} {I}nstruct v0.1 },
year = { 2024 },
url = { https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-instruct-v0.1 },
publisher = { Hugging Face },
journal = { Hugging Face repository }
}
``` |