metadata
license: other
license_name: yi
license_link: LICENSE
The model is a version of the SFT that trained on purpose to do DPO on Yi model.
I was unable to resolve the OOM issue while trying to train DPO, so I am only uploading the SFT.
If you would like to DPO on that model, please use the maywell/why_no_one_do_dpo_on_yi dataset.
It follows prompt format ChatML.
Below code used to load maywell/why_no_one_do_dpo_on_yi dataset on axolotl.
class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
_strict = True
@property
def strict(self):
return self._strict
@strict.setter
def strict(self, strict):
self._strict = strict
def get_conversation_thread(self, prompt):
conversations = prompt['chosen']
turns = [{"from": "assistant" if t["role"] == "assistant" else t["role"], "value": t["content"]} for t in conversations]
return turns
ํด๋น ๋ชจ๋ธ์ Yi ๋ชจ๋ธ์ DPOํ๊ธฐ ์ํด ํ๋ จ์์ผฐ๋ SFT ๋ฒ์ ์ ๋๋ค.
DPO ํ๋ จ์ ํ๋ ค๋ ์ค OOM ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์ง ๋ชปํ์ฌ SFT๋ง ์ ๋ก๋ํฉ๋๋ค.
ํด๋น ๋ชจ๋ธ์ DPO๋ฅผ ํ์๋ ค๋ฉด maywell/why_no_one_do_dpo_on_yi ๋ฐ์ดํฐ์ ์ ์ด์ฉํด์ฃผ์ธ์.
ํ๋กฌํํธ ํฌ๋งท์ ChatML์ ๋ฐ๋ฆ ๋๋ค.