--- license: mit language: - ja pipeline_tag: sentence-similarity --- This model was created by merging [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [stabilityai/japanese-stablelm-base-gamma-7b](https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b). See [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) page or [evaluation notebook of oshizo/JapaneseEmbeddingEval](https://github.com/oshizo/JapaneseEmbeddingEval/blob/main/21_oshizo_japanese-e5-mistral-7b_slerp.ipynb) for model usage. The steps to merge are as follows. 1. Load intfloat/e5-mistral-7b-instruct as a "MistralForCausalLM" class and save_pretrained as is. Because e5-mistral-7b-instruct is made with the "MistralModel" class, it could not be merged with "MistraForCausalLM" as is. In my environment, I had to load into the CPU, not the GPU, or I would get an error. ``` from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "intfloat/e5-mistral-7b-instruct" model = AutoModelForCausalLM.from_pretrained(model_id)#, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) model.save_pretrained("./e5-mistral-7b-instruct_with_lm_head") ``` 2. Merge using [mergekit](https://github.com/cg123/mergekit) with the following yaml configuration merge_config.yaml ``` models: - model: stabilityai/japanese-stablelm-base-gamma-7b - model: ./e5-mistral-7b-instruct_with_lm_head base_model: stabilityai/japanese-stablelm-base-gamma-7b parameters: t: - value: [0.5, 0.9] merge_method: slerp dtype: float16 ``` I tried the "linear", "slerp", and "task_arithmetic" merging methods, and this setting seemed to be the best. The choice of "t" parameters was set to use more japanese-stablelm-base-gamma-7b for the layer closer to the input to take advantage of Japanese word understanding, and more e5-mistral-7b-instruct for the layer closer to the output to generate good embeddings. As for the "ties" method, I could not find any parameters for density and weight that worked properly. 3. Copy settings related to pad_token from the e5-mistral-7b-instruct repository. * config.json * tokenizer.json * tokenizer.model * tokenizer_config.json * special_tokens_map.json