oshizo commited on
Commit
976b6a2
1 Parent(s): 75121dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - ja
5
+ pipeline_tag: sentence-similarity
6
  ---
7
+
8
+ This model was created by merging [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b).
9
+ See [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) page for model usage.
10
+
11
+ The steps to merge are as follows.
12
+
13
+ 1. Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is.
14
+
15
+ Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is.
16
+ In my environment, I had to load into the CPU, not the GPU, or I would get an error.
17
+
18
+ ```
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+
21
+ model_id = "intfloat/e5-mistral-7b-instruct"
22
+ model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto")
23
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
24
+
25
+ model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head")
26
+ ```
27
+
28
+ 2. Merge using [mergekit](https://github.com/cg123/mergekit) with the following yaml configuration
29
+
30
+ merge_config.yaml
31
+ ```
32
+ models:
33
+ - model: stabilityai/japanese-stablelm-base-gamma-7b
34
+ - model: ./e5-mistral-7b-instruct_with_lm_head
35
+ base_model: stabilityai/japanese-stablelm-base-gamma-7b
36
+ parameters:
37
+ t:
38
+ - filter: self_attn
39
+ value: [0.75, 0.25]
40
+ - filter: mlp
41
+ value: [0.75, 0.25]
42
+ - value: 0.5 # fallback for rest of tensors
43
+
44
+ merge_method: slerp
45
+ dtype: float16
46
+ ```
47
+
48
+ I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best.
49
+ The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding,
50
+ and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings.
51
+ As for the "ties" method, I could not find any parameters for density and weight that worked properly.
52
+
53
+ 3. Copy settings related to pad_token from the e5-mistral-7b-instruct repository.
54
+
55
+ * config.json
56
+ * tokenizer.json
57
+ * tokenizer.model
58
+ * tokenizer_config.json