dataset format for translation

#100
by andrejaystevenson - opened

for use this datatset (https://huggingface.co/datasets/tep_en_fa_para) for fine tune Mistral-7B how change dataset cell code?
Dataset cell code:
from datasets import load_dataset
dataset = load_dataset("tep_en_fa_para", split = "train")
EOS_TOKEN = tokenizer.eos_token
def formatting_func(example):
return example["text"] + EOS_TOKEN

Sign up or log in to comment