File size: 7,245 Bytes
16dd2d4 9a79178 ba4cd78 9a79178 ba4cd78 9a79178 ba4cd78 27e8cf6 ba4cd78 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 4a1ab1e 9a79178 16dd2d4 cd6703a 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 9a79178 16dd2d4 27e8cf6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
library_name: transformers
tags: []
---
# Model Overview
This model was trained as a small-scale experiment to determine how easy it is to fine-tune [ai21labs/Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1) to work as a chatbot.
The aim of this experiment was to find how intelligently and reliably Jamba can chat in both English and other languages if only QLoRA finetuned for a few hours.
Initial subjective testing has shown that this model can chat reasonably well in both English and Japanese, so feel free to give it a try!
## Model Details
- **Model type:** Joint Attention and Mamba (Jamba)
- **License:** Apache 2.0
- **Context length:** 256K
- **Knowledge cutoff date:** March 5, 2024
## How to use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kinokokoro/jamba_airoboros3.2_sharegpt4",
trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("kinokokoro/jamba_airoboros3.2_sharegpt4")
input_text = """<|im_start|>system
You are GPT-4, a helpful assistant.
<|im_end|>
<|im_start|>user
ๆ่ฟใ้ๅใใใฐใใใใซใใฃใกใใใฃใกใๆฑใใใกใใใใ ใใฉใใฉใใใใใใใงใใ๏ผ
<|im_end|>
<|im_start|>assistant
"""
input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]
outputs = model.generate(input_ids, max_new_tokens=256, temperature=0.0)\
print(tokenizer.batch_decode([outputs[0][len(input_ids[0]):]]))
# ['ๆฑใๅบใใใจใฏใ้ๅใใใใจใใซไฝๆธฉใไธใใใไฝๅ
ใฎ็ฑใๅค้จใซๆพๅบใใใใใฎ่ช็ถใชใกใซใใบใ ใงใใๆฑใๅบใใใจใๅคใใใจใฏใไธ่ฌ็ใซใฏใไฝใฎๆธฉๅบฆ่ชฟ็ฏๆฉ่ฝใๅใใฆใใใใจใๆๅณใใพใใใใใใๆฑใๅบใใใจใๅคใใใใจใไธๅฟซๆใๆฑ็ใชใฉใฎๅ้กใ็บ็ใใใใจใใใใพใใไปฅไธใซใๆฑใๅบใใใจใๅคใๅ ดๅใฎๅฏพ็ญใ็ดนไปใใพใใ\n\n1. ้ฉๅใชๆ่ฃ
ใ้ธใถ: ๆฑใๅบใใใจใๅคใๅ ดๅใ่ปฝ้ใง้ๆนฟๆงใฎ้ซใๆใ้ธใถใใจใ้่ฆใงใใใใใซใใใๆฑใไฝใใๅค้จใซ๏ฟฝ']
```
# Initial testing results
# Training details
The model was trained on 2 open source datasets (one multilingual) for one epoch on a A100 (80GB) x 4 environment for 3 hours.
## Training data
* [jondurbin/airoboros-3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2)
A ~59K example dataset of curated LLM tasks in English, primarily generated with GPT-4. This dataset has been used by some of the best performing open source LLMs in the world (e.g. [jondurbin/bagel-7b-v0.4](https://huggingface.co/jondurbin/bagel-7b-v0.4), [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO)) and contains a wide variety of tasks, so we hypothesized that this would lead to a multi-talented, accurate model. For this reason we chose this dataset was chosen for the bulk of our training data.
Note: Each element in jondurbin/airoboros-3.2 already contains a system message.
* [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset) (GPT-4 responses only)
A ~6K example dataset of multilingual multi-turn chats between users and GPT-4. While jondurbin/airoboros-3.2 has deilvered good results for models previously, it sadly contains no (or seemingly very little) multilingual data. We are a Japanese AI company, so require an LLM to be able to output in Japanese too. Hence we also selected a small, seemingly high quality dataset of GPT-4 responses in many languages from the ShareGPT dataset. We chose to only select the GPT-4 responses as we wanted to keep our dataset as small and high quality as possible to maximise the efficiency of our training.
Note: openchat/openchat_sharegpt4_dataset does not contain system messages, so we added 'You are GPT-4, a helpful assistant.' as our system message.
<details>
<summary>Data preparation code</summary>
```python
import os
import pandas as pd
from datasets import load_dataset, Dataset, concatenate_datasets
os.environ['HF_HOME'] = "/workspace/hf_home"
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = "1"
boros_dataset = load_dataset("jondurbin/airoboros-3.2", split='train')
gpt4_df = pd.read_json("https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json?download=true")
gpt4_df["conversations"] = gpt4_df["items"].apply(lambda x: [{'from': 'system', 'value': 'You are GPT-4, a helpful assistant.'}] + x)
gpt4_dataset = Dataset.from_pandas(gpt4_df[["conversations"]])
dataset = concatenate_datasets([gpt4_dataset, boros_dataset]).shuffle()
dataset.select_columns(["conversations"]).to_json("/workspace/airoboros-3.2_plus_openchat_sharegpt4.json")
```
</details>
## Training
The Jamba-v0.1 base model was trained for roughly 3 hours in a A100 (80GB) x 4 environment on the Azure cloud (Standard_NC96ads_A100_v4).
Our training harness was Axolotl, with the following config as our training parameters:
<details>
<summary>Training config</summary>
```yaml
base_model: ai21labs/Jamba-v0.1
trust_remote_code: true
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: /workspace/airoboros-3.2_plus_openchat_sharegpt4.json
ds_type: json
type: sharegpt
conversation: chatml
dataset_prepared_path:
val_set_size: 0.01
output_dir: ./airoboros-3.2_plus_openchat_sharegpt4_one_epoch
sequence_len: 6000
sample_packing: true
pad_to_sequence_len: false
eval_sample_packing: true
use_wandb: true
wandb_project: axolotl
wandb_entity: peterd
wandb_name: airoboros-3.2_plus_openchat_sharegpt4
adapter: qlora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
low_cpu_mem_usage: true
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 5
saves_per_epoch: 5
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
weight_decay: 0.0
special_tokens:
```
</details>
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details>
<summary>Training graphs</summary>
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/umxTIsNRHUtKS_kL81Uyf.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/mpuCoL99rxX8RCgXH1CJo.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/5FvwYNdte-bgzEvcvFO8I.png)
</details>
<br/>
# Developers
Lead developer - Peter Devine [ptrdvn](https://huggingface.co/ptrdvn)
Administrative supervisor - Shunichi Taniguchi [ptrdvn](https://huggingface.co/ptrdvn) |