oshizo's picture
Update README.md
976b6a2
|
raw
history blame
No virus
2.22 kB
---
license: mit
language:
- ja
pipeline_tag: sentence-similarity
---
This model was created by merging [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b).
See [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) page for model usage.
The steps to merge are as follows.
1. Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is.
Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is.
In my environment, I had to load into the CPU, not the GPU, or I would get an error.
```
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "intfloat/e5-mistral-7b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head")
```
2. Merge using [mergekit](https://github.com/cg123/mergekit) with the following yaml configuration
merge_config.yaml
```
models:
- model: stabilityai/japanese-stablelm-base-gamma-7b
- model: ./e5-mistral-7b-instruct_with_lm_head
base_model: stabilityai/japanese-stablelm-base-gamma-7b
parameters:
t:
- filter: self_attn
value: [0.75, 0.25]
- filter: mlp
value: [0.75, 0.25]
- value: 0.5 # fallback for rest of tensors
merge_method: slerp
dtype: float16
```
I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best.
The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding,
and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings.
As for the "ties" method, I could not find any parameters for density and weight that worked properly.
3. Copy settings related to pad_token from the e5-mistral-7b-instruct repository.
* config.json
* tokenizer.json
* tokenizer.model
* tokenizer_config.json