Text Generation
Transformers
PyTorch
English
llama
conversational
Inference Endpoints
text-generation-inference

Structure of Dataset for fine-tuning

#9
by sid0608 - opened

Hi,

I am a beginner in LLMs. I want to fine-tune the Dolphin-2_2-yi-34b model for generating long and complex stories. I am struggling to understand the structure and type of dataset I should use. I want users to input details of the story they want to generate, the genre of the stories, and the language they want their stories in. Collecting the dataset is not a problem for me. I am mainly struggling with the format of the dataset to be fed into the model for training.

Please note that I am a beginner, so a detailed and thorough explanation will be appreciated.

Cognitive Computations org

Hi,

I am a beginner in LLMs. I want to fine-tune the Dolphin-2_2-yi-34b model for generating long and complex stories. I am struggling to understand the structure and type of dataset I should use. I want users to input details of the story they want to generate, the genre of the stories, and the language they want their stories in. Collecting the dataset is not a problem for me. I am mainly struggling with the format of the dataset to be fed into the model for training.

Please note that I am a beginner, so a detailed and thorough explanation will be appreciated.

@sid0608

Note this is the old Yi model. Yi 1.5 Dolphin 2.9.1 is much better. Sadly I am not a expert at datasets so you should repost this commment there!

Kearm changed discussion status to closed

Sign up or log in to comment