maywell/Yi-34B-Undertrained

The model is a version of the SFT that trained on purpose to do DPO on Yi model.

I was unable to resolve the OOM issue while trying to train DPO, so I am only uploading the SFT.

If you would like to DPO on that model, please use the maywell/why_no_one_do_dpo_on_yi dataset.

It follows prompt format ChatML.

Below code used to load maywell/why_no_one_do_dpo_on_yi dataset on axolotl.

class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):

    _strict = True

    @property
    def strict(self):
        return self._strict

    @strict.setter
    def strict(self, strict):
        self._strict = strict

    def get_conversation_thread(self, prompt):
        conversations = prompt['chosen']
        turns = [{"from": "assistant" if t["role"] == "assistant" else t["role"], "value": t["content"]} for t in conversations]
        return turns

해당 모델은 Yi 모델을 DPO하기 위해 훈련시켰던 SFT 버전입니다.

DPO 훈련을 하려던 중 OOM 문제를 해결하지 못하여 SFT만 업로드합니다.

해당 모델에 DPO를 하시려면 maywell/why_no_one_do_dpo_on_yi 데이터셋을 이용해주세요.

프롬프트 포맷은 ChatML을 따릅니다.