|
--- |
|
license: other |
|
license_name: other |
|
license_link: LICENSE |
|
--- |
|
|
|
Model Mixed by [Reborn Merge Method](https://medium.com/@puffanddmx82/reborn-elevating-model-adaptation-with-merging-for-superior-nlp-performance-f604e8e307b2) |
|
|
|
Keep in mind that the accuracy of your desired questions may vary for this merge. |
|
|
|
Will it be possible to use this merge as a base for future my another merge work? |
|
|
|
I hope this merge model combines information and grammar appropriately so that it doesn't just give strange, nonsensical answers. Then I can make new cool food with the next merge... |
|
|
|
ps : What I am saying above is not to say that each model is strange. It means I could be doing the merge wrong. I hope there is no misunderstanding. |
|
|
|
I am open for the "Collaboration & ETC" if you want |
|
|
|
``` |
|
Reborn Merge Information |
|
|
|
[models info] |
|
reference_model_name = "MLP-KTLim/llama-3-Korean-Bllossom-8B" |
|
base_model_name = "NousResearch/Meta-Llama-3-8B-Instruct" |
|
target_model_name = "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1" |
|
|
|
[interpolating mismatch part vocab] |
|
Interpolating tensor 'model.embed_tokens.weight' to match the shape: torch.Size([145088, 4096]) vs torch.Size([128256, 4096]) |
|
Interpolating tensor 'lm_head.weight' to match the shape: torch.Size([145088, 4096]) vs torch.Size([128256, 4096]) |
|
Interpolating tensor 'model.embed_tokens.weight' to match the shape: torch.Size([128256, 4096]) vs torch.Size([128257, 4096]) |
|
Interpolating tensor 'lm_head.weight' to match the shape: torch.Size([128256, 4096]) vs torch.Size([128257, 4096]) |
|
``` |
|
|
|
Ollama Create |
|
``` |
|
jaylee@lees-MacBook-Pro-2 % ./ollama create Joah -f ./gguf/Joah-Llama-3-MAAL-MLP-KoEn-8B-Reborn/Modelfile_Q5_K_M |
|
transferring model data |
|
creating model layer |
|
creating template layer |
|
creating system layer |
|
creating parameters layer |
|
creating config layer |
|
using already created layer sha256:4eadb53f0c70683aeab133c60d76b8ffc9f41ca5d49524d4b803c19e5ce7e3a5 |
|
using already created layer sha256:8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf6f |
|
writing layer sha256:ae2974c64ea5d6f488eeb1b10717a270f48fb3452432589db6f5e60472ae96ac |
|
writing layer sha256:74ef6315972b317734fe01e7e1ad5b49fce1fa8ed3978cb66501ecb8c3a2e984 |
|
writing layer sha256:83882a5e957b8ce0d454f26bcedb2819413b49d6b967b28d60edb8ac61edfa58 |
|
writing manifest |
|
success |
|
``` |
|
|
|
MODELFILE |
|
``` |
|
FROM joah-llama-3-maal-mlp-koen-8b-reborn-Q5_K_M.gguf |
|
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> |
|
|
|
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> |
|
|
|
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> |
|
|
|
{{ .Response }}<|eot_id|>""" |
|
|
|
|
|
SYSTEM """ |
|
μΉμ ν μ±λ΄μΌλ‘μ μλλ°©μ μμ²μ μ΅λν μμΈνκ³ μΉμ νκ² λ΅νμ. λͺ¨λ λλ΅μ νκ΅μ΄(Korean)μΌλ‘ λλ΅ν΄μ€. |
|
""" |
|
|
|
PARAMETER num_keep 24 |
|
PARAMETER temperature 0.7 |
|
PARAMETER num_predict 3000 |
|
PARAMETER stop "<|start_header_id|>" |
|
PARAMETER stop "<|end_header_id|>" |
|
PARAMETER stop "<|eot_id|>" |
|
``` |
|
|
|
## Citation |
|
**Language Model** |
|
```text |
|
@misc{bllossom, |
|
author = {ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, HyeJin Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim}, |
|
title = {Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean}, |
|
year = {2024}, |
|
journal = {LREC-COLING 2024}, |
|
paperLink = {\url{https://arxiv.org/pdf/2403.10882}}, |
|
}, |
|
} |
|
|
|
@article{llama3modelcard, |
|
|
|
title={Llama 3 Model Card}, |
|
|
|
author={AI@Meta}, |
|
|
|
year={2024}, |
|
|
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
|
|
} |
|
``` |