Megakiqu-120b / README.md
kuotient's picture
Update README.md
b0d8b59 verified
metadata
base_model:
  - maywell/kiqu-70b
library_name: transformers
tags:
  - mergekit
  - merge
license: cc-by-sa-4.0
language:
  - ko

Megakiqu-120b

megakiqu-120B MegaDolphin-120B๋‚˜ Venus-120B๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ passthrough method๋กœ ํ™•์žฅ๋œ ๋ชจ๋ธ.

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

Original Model Card

kiqu-70b (Arena Leaderboard)

kiqu-70b is a SFT+DPO trained model based on Miqu-70B-Alpaca-DPO using Korean datasets.

Since this model is finetune of miqu-1-70b using it on commercial purposes is at your own risk. โ€” leaked early version Mistral-Medium

๋ณธ ๋ชจ๋ธ kiqu-70b๋Š” Miqu-70B-Alpaca-DPO ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ SFT+DPO ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•˜์—ฌ ์ œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฒ ์ด์Šค ๋ชจ๋ธ์ธ miqu-1-70b ๋ชจ๋ธ์ด ๋ฏธ์ŠคํŠธ๋ž„-๋ฏธ๋””์›€์˜ ์ดˆ๊ธฐ ์œ ์ถœ ๋ฒ„์ „์ด๊ธฐ์— ์ƒ์—…์  ์‚ฌ์šฉ์— ๋Œ€ํ•œ risk๋Š” ๋ณธ์ธ์—๊ฒŒ ์žˆ์Šต๋‹ˆ๋‹ค.

Beside that this model follows cc-by-sa-4.0

๋ณธ ๋ชจ๋ธ ์ž์ฒด๋กœ์„œ๋Š” cc-by-sa-4.0์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

Model Details

Base Model
miqu-1-70b (Early Mistral-Medium)

Instruction format

It follows Mistral format. Giving few-shots to model is highly recommended

๋ณธ ๋ชจ๋ธ์€ ๋ฏธ์ŠคํŠธ๋ž„ ํฌ๋งท์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. few-shot ์‚ฌ์šฉ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

[INST] {instruction}
[/INST] {output}

Multi-shot

[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
[INST] {instruction}
[/INST] {output}
.
.
.

Recommended Template - 1-shot with system prompt

๋„ˆ๋Š” kiqu-70B๋ผ๋Š” ํ•œ๊ตญ์–ด์— ํŠนํ™”๋œ ์–ธ์–ด๋ชจ๋ธ์ด์•ผ. ๊น”๋”ํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋Œ€๋‹ตํ•ด์ค˜!
[INST] ์•ˆ๋…•?
[/INST] ์•ˆ๋…•ํ•˜์„ธ์š”! ๋ฌด์—‡์„ ๋„์™€๋“œ๋ฆด๊นŒ์š”? ์งˆ๋ฌธ์ด๋‚˜ ๊ถ๊ธˆํ•œ ์ ์ด ์žˆ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๋ง์”€ํ•ด์ฃผ์„ธ์š”.
[INST] {instruction}
[/INST]

Trailing space after [/INST] can affect models performance in significant margin. So, when doing inference it is recommended to not include trailing space in chat template.

[/INST] ๋’ค์— ๋„์–ด์“ฐ๊ธฐ๋Š” ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ธํผ๋Ÿฐ์Šค(์ถ”๋ก )๊ณผ์ •์—์„œ๋Š” ์ฑ— ํ…œํ”Œ๋ฆฟ์— ๋„์–ด์“ฐ๊ธฐ๋ฅผ ์ œ์™ธํ•˜๋Š” ๊ฒƒ์„ ์ ๊ทน ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

Configuration

The following mergekit's YAML configuration was used to produce this model:

dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [10, 30]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [20, 40]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [30, 50]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [40, 60]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [50, 70]
    model: maywell/kiqu-70b
- sources:
  - layer_range: [60, 80]
    model: maywell/kiqu-70b