keitokei1994 commited on
Commit
7455bfa
1 Parent(s): 36c66a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - ja
5
+ - en
6
+ ---
7
+
8
+ ### モデルの説明(English explanation is below.)
9
+
10
+ このモデルは、MergeKitツールを使用して作成されたMixture of Experts (MoE) 言語モデルです。
11
+
12
+ gguf版(今後拡充予定)は [こちら](https://huggingface.co/keitokei1994/Llama-3-youko-chatvector-2x8B_v0.1-gguf) 。
13
+
14
+ 元のmeta-llama/Meta-Llama-3-8B-Instructに、日本語データセットで継続事前学習されたrinna/llama-3-youko-8bを合わせることで、元のMeta-Llama-3-8B-Instructの能力を維持したまま、日本語能力を向上させようとしたMoEモデルです。
15
+
16
+ [Sdff-Ltba/LightChatAssistant-2x7B](https://huggingface.co/Sdff-Ltba/LightChatAssistant-2x7B)を参考に、以下のようなChatVector加算を行ったllama-3-youko-8bをMoEに用いています。
17
+
18
+ > rinna/llama-3-youko-8b + 0.8*(meta-llama/Meta-Llama-3-8B-Instruct - meta-llama/Meta-Llama-3-8B)
19
+
20
+ このChatVector加算操作をすでに行った状態のllama-3-youko-8bモデルが[aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector)でアップロードされていたため、今回はこちらを利用させていただいております。
21
+
22
+ ### モデルの詳細
23
+
24
+ - **モデル名**: Llama-3-youko-chatvector-2x8B_v0.1
25
+ - **モデルアーキテクチャ**: Mixture of Experts (MoE)
26
+ - **ベースモデル**: meta-llama/Meta-Llama-3-8B-Instruct, rinna/llama-3-youko-8b
27
+ - **マージツール**: MergeKit
28
+
29
+ #### 要求スペック
30
+ Q4_K_M量子化モデルであれば、RTX3060 12GBでフルロード可能です。
31
+
32
+ 筆者はWSL2やGoogle Colaboratotry Proでの作成後、Llama.cppとLMstudioにて動作確認を行なっています。
33
+
34
+ - CPU: Ryzen 5 3600
35
+ - GPU: GeForce RTX 3060 12GB
36
+ - RAM: DDR4-3200 96GB
37
+ - OS: Windows 10
38
+
39
+ ---
40
+
41
+ ### Model Description
42
+
43
+ This model is a Mixture of Experts (MoE) language model created using the MergeKit tool.
44
+
45
+ The gguf version can be found [here](https://huggingface.co/keitokei1994/Llama-3-youko-chatvector-2x8B_v0.1-gguf).
46
+
47
+ By combining rinna/llama-3-youko-8b, which has been further pre-trained on Japanese datasets, with the original meta-llama/Meta-Llama-3-8B-Instruct, this MoE model aims to improve Japanese language capabilities while maintaining the abilities of the original Meta-Llama-3-8B-Instruct.
48
+
49
+ Referring to [Sdff-Ltba/LightChatAssistant-2x7B](https://huggingface.co/Sdff-Ltba/LightChatAssistant-2x7B), the llama-3-youko-8b used in the MoE has undergone the following ChatVector addition:
50
+
51
+ > rinna/llama-3-youko-8b + 0.8*(meta-llama/Meta-Llama-3-8B-Instruct - meta-llama/Meta-Llama-3-8B)
52
+
53
+ The llama-3-youko-8b model with this ChatVector addition operation already performed has been uploaded at [aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector), which we have utilized in this case.
54
+
55
+ ### Model Details
56
+
57
+ - **Model Name**: Llama-3-youko-chatvector-2x8B_v0.1
58
+ - **Model Architecture**: Mixture of Experts (MoE)
59
+ - **Base Models**: meta-llama/Meta-Llama-3-8B-Instruct, rinna/llama-3-youko-8b
60
+ - **Merging Tool**: MergeKit
61
+
62
+ #### Required Specifications
63
+ With the Q4_K_M quantized model, it can be fully loaded on an RTX 3060 12GB.
64
+
65
+ The author has created the model on WSL2 and Google Colaboratory Pro, and has verified its operation using Llama.cpp and LMstudio.
66
+
67
+ - CPU: Ryzen 5 3600
68
+ - GPU: GeForce RTX 3060 12GB
69
+ - RAM: DDR4-3200 96GB
70
+ - OS: Windows 10