asiansoul's picture
Update README.md
9dd8790 verified
|
raw
history blame
3.64 kB
metadata
license: other
license_name: other
license_link: LICENSE

Model Mixed by Reborn Merge Method

Keep in mind that the accuracy of your desired questions may vary for this merge.

Will it be possible to use this merge as a base for future my another merge work?

I hope this merge model combines information and grammar appropriately so that it doesn't just give strange, nonsensical answers. Then I can make new cool food with the next merge...

ps : What I am saying above is not to say that each model is strange. It means I could be doing the merge wrong. I hope there is no misunderstanding.

I am open for the "Collaboration & ETC" if you want

Reborn Merge Information

[models info]
reference_model_name = "MLP-KTLim/llama-3-Korean-Bllossom-8B"
base_model_name = "NousResearch/Meta-Llama-3-8B-Instruct"
target_model_name = "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1"

[interpolating mismatch part vocab]
Interpolating tensor 'model.embed_tokens.weight' to match the shape: torch.Size([145088, 4096]) vs torch.Size([128256, 4096])
Interpolating tensor 'lm_head.weight' to match the shape: torch.Size([145088, 4096]) vs torch.Size([128256, 4096])
Interpolating tensor 'model.embed_tokens.weight' to match the shape: torch.Size([128256, 4096]) vs torch.Size([128257, 4096])
Interpolating tensor 'lm_head.weight' to match the shape: torch.Size([128256, 4096]) vs torch.Size([128257, 4096])

Ollama Create

jaylee@lees-MacBook-Pro-2  % ./ollama create Joah -f ./gguf/Joah-Llama-3-MAAL-MLP-KoEn-8B-Reborn/Modelfile_Q5_K_M 
transferring model data 
creating model layer 
creating template layer 
creating system layer 
creating parameters layer 
creating config layer 
using already created layer sha256:4eadb53f0c70683aeab133c60d76b8ffc9f41ca5d49524d4b803c19e5ce7e3a5 
using already created layer sha256:8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf6f 
writing layer sha256:ae2974c64ea5d6f488eeb1b10717a270f48fb3452432589db6f5e60472ae96ac 
writing layer sha256:74ef6315972b317734fe01e7e1ad5b49fce1fa8ed3978cb66501ecb8c3a2e984 
writing layer sha256:83882a5e957b8ce0d454f26bcedb2819413b49d6b967b28d60edb8ac61edfa58 
writing manifest 
success 

MODELFILE

FROM joah-llama-3-maal-mlp-koen-8b-reborn-Q5_K_M.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""


SYSTEM """
μΉœμ ˆν•œ μ±—λ΄‡μœΌλ‘œμ„œ μƒλŒ€λ°©μ˜ μš”μ²­μ— μ΅œλŒ€ν•œ μžμ„Έν•˜κ³  μΉœμ ˆν•˜κ²Œ λ‹΅ν•˜μž. λͺ¨λ“  λŒ€λ‹΅μ€ ν•œκ΅­μ–΄(Korean)으둜 λŒ€λ‹΅ν•΄μ€˜.
"""

PARAMETER num_keep 24
PARAMETER temperature 0.7
PARAMETER num_predict 3000
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

Citation

Language Model

@misc{bllossom,
  author = {ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, HyeJin Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim},
  title = {Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean},
  year = {2024},
  journal = {LREC-COLING 2024},
  paperLink = {\url{https://arxiv.org/pdf/2403.10882}},
 },
}

@article{llama3modelcard,

  title={Llama 3 Model Card},

  author={AI@Meta},

  year={2024},

  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}

}