ptrdvn commited on
Commit
2ba18e8
1 Parent(s): 49554ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -14
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
2
  license: other
 
 
 
3
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
- - name: workspace/llm_training/axolotl/llama3-multilingual/output_tagengo_openchat_megagon_8B_llama3
8
  results: []
9
  ---
10
 
@@ -38,9 +41,9 @@ sample_packing: true
38
  pad_to_sequence_len: true
39
 
40
  use_wandb: true
41
- wandb_project: axolotl
42
- wandb_entity: peterd
43
- wandb_name: tagengo_openchat_megagon_8B_instruct
44
 
45
  gradient_accumulation_steps: 2
46
  micro_batch_size: 2
@@ -77,23 +80,68 @@ special_tokens:
77
 
78
  </details><br>
79
 
80
- # workspace/llm_training/axolotl/llama3-multilingual/output_tagengo_openchat_megagon_8B_llama3
 
 
81
 
82
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset.
83
- It achieves the following results on the evaluation set:
84
- - Loss: 0.6595
 
 
 
85
 
86
- ## Model description
87
 
88
- More information needed
89
 
90
- ## Intended uses & limitations
91
 
92
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
- More information needed
97
 
98
  ## Training procedure
99
 
 
1
  ---
2
  license: other
3
+ license_name: llama-3
4
+ license_link: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/raw/main/LICENSE
5
+
6
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
7
  tags:
8
  - generated_from_trainer
9
  model-index:
10
+ - name: lightblue/suzume-llama-3-8B-multilingual
11
  results: []
12
  ---
13
 
 
41
  pad_to_sequence_len: true
42
 
43
  use_wandb: true
44
+ wandb_project: wandb_project
45
+ wandb_entity: wandb_entity
46
+ wandb_name: wandb_name
47
 
48
  gradient_accumulation_steps: 2
49
  micro_batch_size: 2
 
80
 
81
  </details><br>
82
 
83
+ <p align="center">
84
+ <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/kg3QjQOde0X743csGJT-f.png" alt="Suzume - a Japanese tree sparrow"/>
85
+ </p>
86
 
87
+ # Suzume
88
+
89
+ This Suzume 8B, a multilingual finetune of Llama 3.
90
+
91
+ Llama 3 has exhibited excellent performance on many English language benchmarks.
92
+ However, it also seemingly been finetuned on mostly English data, meaning that it will respond in English, even if prompted in other languages.
93
 
94
+ We have fine-tuned Llama 3 on almost 90,000 multilingual conversations meaning that this model has the smarts of Llama 3 but has the added ability to chat in more languages.
95
 
96
+ Please feel free to comment on this model and give us feedback in the Community tab!
97
 
98
+ # How to use
99
 
100
+ The easiest way to use this model on your own computer is to use the [GGUF version of this model (lightblue/suzume-llama-3-8B-multilingual-gguf)](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-gguf) using a program such as (jan.ai)[https://jan.ai/] or [LM Studio](https://lmstudio.ai/).
101
+
102
+ If you want to use this model directly in Python, we recommend using vLLM for the fastest inference speeds.
103
+
104
+ ```python
105
+ from vllm import LLM, SamplingParams
106
+
107
+ sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
108
+ llm = LLM(model="lightblue/suzume-llama-3-8B-multilingual")
109
+
110
+ messages = []
111
+ messages.append({"role": "user", "content": "Bonjour!"})
112
+ prompt = llm.llm_engine.tokenizer.tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
113
+ prompts = [prompt]
114
+
115
+ outputs = llm.generate(prompts, sampling_params)
116
+ for output in outputs:
117
+ prompt = output.prompt
118
+ generated_text = output.outputs[0].text
119
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
120
+ ```
121
 
122
+ # Evaluation scores
123
+
124
+ We achieve the following MT-Bench scores across 6 languages:
125
+
126
+
127
+
128
+ # Training data
129
+
130
+ We train on three sources of data to create this model:
131
+
132
+ * [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4) - 76,338 conversations
133
+ * A diverse dataset of initial inputs sampled from [lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) and then used to prompt `gpt-4-0125-preview`
134
+ * [megagonlabs/instruction_ja](https://github.com/megagonlabs/instruction_ja) - 669 conversations
135
+ * A hand-edited dataset of nearly 700 Japanese conversations taken originally from translations of the [kunishou/hh-rlhf-49k-ja](https://huggingface.co/datasets/kunishou/hh-rlhf-49k-ja) dataset.
136
+ * [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json) - 6,206 conversations
137
+ * Conversations taken from humans talking to GPT-4
138
+
139
+ # workspace/llm_training/axolotl/llama3-multilingual/output_tagengo_openchat_megagon_8B_llama3
140
+
141
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the above described dataset.
142
+ It achieves the following results on the evaluation set:
143
+ - Loss: 0.6595
144
 
 
145
 
146
  ## Training procedure
147