license: mit
language:
- ja
pipeline_tag: sentence-similarity
This model was created by merging intfloat/e5-mistral-7b-instruct and stabilityai/japanese-stablelm-base-gamma-7b.
See intfloat/e5-mistral-7b-instruct page or evaluation notebook of oshizo/JapaneseEmbeddingEval for model usage.
The steps to merge are as follows.
- Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is.
Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is.
In my environment, I had to load into the CPU, not the GPU, or I would get an error.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "intfloat/e5-mistral-7b-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head")
- Merge using mergekit with the following yaml configuration
merge_config.yaml
models:
- model: stabilityai/japanese-stablelm-base-gamma-7b
- model: ./e5-mistral-7b-instruct_with_lm_head
base_model: stabilityai/japanese-stablelm-base-gamma-7b
parameters:
t:
- value: [0.5, 0.9]
merge_method: slerp
dtype: float16
I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best.
The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding,
and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings.
As for the "ties" method, I could not find any parameters for density and weight that worked properly.
- Copy settings related to pad_token from the e5-mistral-7b-instruct repository.
- config.json
- tokenizer.json
- tokenizer.model
- tokenizer_config.json
- special_tokens_map.json