Yi-34B-Undertrained / README.md
maywell's picture
Update README.md
1216dd4
metadata
license: other
license_name: yi
license_link: LICENSE

The model is a version of the SFT that trained on purpose to do DPO on Yi model.

I was unable to resolve the OOM issue while trying to train DPO, so I am only uploading the SFT.

If you would like to DPO on that model, please use the maywell/why_no_one_do_dpo_on_yi dataset.

It follows prompt format ChatML.

Below code used to load maywell/why_no_one_do_dpo_on_yi dataset on axolotl.

class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):

    _strict = True

    @property
    def strict(self):
        return self._strict

    @strict.setter
    def strict(self, strict):
        self._strict = strict

    def get_conversation_thread(self, prompt):
        conversations = prompt['chosen']
        turns = [{"from": "assistant" if t["role"] == "assistant" else t["role"], "value": t["content"]} for t in conversations]
        return turns

ํ•ด๋‹น ๋ชจ๋ธ์€ Yi ๋ชจ๋ธ์„ DPOํ•˜๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ์‹œ์ผฐ๋˜ SFT ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.

DPO ํ›ˆ๋ จ์„ ํ•˜๋ ค๋˜ ์ค‘ OOM ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•˜์—ฌ SFT๋งŒ ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

ํ•ด๋‹น ๋ชจ๋ธ์— DPO๋ฅผ ํ•˜์‹œ๋ ค๋ฉด maywell/why_no_one_do_dpo_on_yi ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•ด์ฃผ์„ธ์š”.

ํ”„๋กฌํ”„ํŠธ ํฌ๋งท์€ ChatML์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.