ptrdvn commited on
Commit
02601cf
1 Parent(s): 2ba18e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -64
README.md CHANGED
@@ -11,6 +11,71 @@ model-index:
11
  results: []
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
@@ -80,70 +145,6 @@ special_tokens:
80
 
81
  </details><br>
82
 
83
- <p align="center">
84
- <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/kg3QjQOde0X743csGJT-f.png" alt="Suzume - a Japanese tree sparrow"/>
85
- </p>
86
-
87
- # Suzume
88
-
89
- This Suzume 8B, a multilingual finetune of Llama 3.
90
-
91
- Llama 3 has exhibited excellent performance on many English language benchmarks.
92
- However, it also seemingly been finetuned on mostly English data, meaning that it will respond in English, even if prompted in other languages.
93
-
94
- We have fine-tuned Llama 3 on almost 90,000 multilingual conversations meaning that this model has the smarts of Llama 3 but has the added ability to chat in more languages.
95
-
96
- Please feel free to comment on this model and give us feedback in the Community tab!
97
-
98
- # How to use
99
-
100
- The easiest way to use this model on your own computer is to use the [GGUF version of this model (lightblue/suzume-llama-3-8B-multilingual-gguf)](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-gguf) using a program such as (jan.ai)[https://jan.ai/] or [LM Studio](https://lmstudio.ai/).
101
-
102
- If you want to use this model directly in Python, we recommend using vLLM for the fastest inference speeds.
103
-
104
- ```python
105
- from vllm import LLM, SamplingParams
106
-
107
- sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
108
- llm = LLM(model="lightblue/suzume-llama-3-8B-multilingual")
109
-
110
- messages = []
111
- messages.append({"role": "user", "content": "Bonjour!"})
112
- prompt = llm.llm_engine.tokenizer.tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
113
- prompts = [prompt]
114
-
115
- outputs = llm.generate(prompts, sampling_params)
116
- for output in outputs:
117
- prompt = output.prompt
118
- generated_text = output.outputs[0].text
119
- print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
120
- ```
121
-
122
- # Evaluation scores
123
-
124
- We achieve the following MT-Bench scores across 6 languages:
125
-
126
-
127
-
128
- # Training data
129
-
130
- We train on three sources of data to create this model:
131
-
132
- * [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4) - 76,338 conversations
133
- * A diverse dataset of initial inputs sampled from [lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) and then used to prompt `gpt-4-0125-preview`
134
- * [megagonlabs/instruction_ja](https://github.com/megagonlabs/instruction_ja) - 669 conversations
135
- * A hand-edited dataset of nearly 700 Japanese conversations taken originally from translations of the [kunishou/hh-rlhf-49k-ja](https://huggingface.co/datasets/kunishou/hh-rlhf-49k-ja) dataset.
136
- * [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json) - 6,206 conversations
137
- * Conversations taken from humans talking to GPT-4
138
-
139
- # workspace/llm_training/axolotl/llama3-multilingual/output_tagengo_openchat_megagon_8B_llama3
140
-
141
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the above described dataset.
142
- It achieves the following results on the evaluation set:
143
- - Loss: 0.6595
144
-
145
-
146
- ## Training procedure
147
 
148
  ### Training hyperparameters
149
 
 
11
  results: []
12
  ---
13
 
14
+ <p align="center">
15
+ <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/kg3QjQOde0X743csGJT-f.png" alt="Suzume - a Japanese tree sparrow"/>
16
+ </p>
17
+
18
+ # Suzume
19
+
20
+ This Suzume 8B, a multilingual finetune of Llama 3.
21
+
22
+ Llama 3 has exhibited excellent performance on many English language benchmarks.
23
+ However, it also seemingly been finetuned on mostly English data, meaning that it will respond in English, even if prompted in other languages.
24
+
25
+ We have fine-tuned Llama 3 on almost 90,000 multilingual conversations meaning that this model has the smarts of Llama 3 but has the added ability to chat in more languages.
26
+
27
+ Please feel free to comment on this model and give us feedback in the Community tab!
28
+
29
+ # How to use
30
+
31
+ The easiest way to use this model on your own computer is to use the [GGUF version of this model (lightblue/suzume-llama-3-8B-multilingual-gguf)](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-gguf) using a program such as (jan.ai)[https://jan.ai/] or [LM Studio](https://lmstudio.ai/).
32
+
33
+ If you want to use this model directly in Python, we recommend using vLLM for the fastest inference speeds.
34
+
35
+ ```python
36
+ from vllm import LLM, SamplingParams
37
+
38
+ sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
39
+ llm = LLM(model="lightblue/suzume-llama-3-8B-multilingual")
40
+
41
+ messages = []
42
+ messages.append({"role": "user", "content": "Bonjour!"})
43
+ prompt = llm.llm_engine.tokenizer.tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
44
+ prompts = [prompt]
45
+
46
+ outputs = llm.generate(prompts, sampling_params)
47
+ for output in outputs:
48
+ prompt = output.prompt
49
+ generated_text = output.outputs[0].text
50
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
51
+ ```
52
+
53
+ # Evaluation scores
54
+
55
+ We achieve the following MT-Bench scores across 6 languages:
56
+
57
+
58
+
59
+ # Training data
60
+
61
+ We train on three sources of data to create this model:
62
+
63
+ * [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4) - 76,338 conversations
64
+ * A diverse dataset of initial inputs sampled from [lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) and then used to prompt `gpt-4-0125-preview`
65
+ * [megagonlabs/instruction_ja](https://github.com/megagonlabs/instruction_ja) - 669 conversations
66
+ * A hand-edited dataset of nearly 700 Japanese conversations taken originally from translations of the [kunishou/hh-rlhf-49k-ja](https://huggingface.co/datasets/kunishou/hh-rlhf-49k-ja) dataset.
67
+ * [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json) - 6,206 conversations
68
+ * Multilingual conversations of humans talking to GPT-4.
69
+
70
+ # workspace/llm_training/axolotl/llama3-multilingual/output_tagengo_openchat_megagon_8B_llama3
71
+
72
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the above described dataset.
73
+ It achieves the following results on the evaluation set:
74
+ - Loss: 0.6595
75
+
76
+
77
+ ## Training procedure
78
+
79
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
80
  should probably proofread and complete it, then remove this comment. -->
81
 
 
145
 
146
  </details><br>
147
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
  ### Training hyperparameters
150