What is the string format for training.

#1
by ohwi - opened

Hi. Thank you for sharing great work.

I wonder the input format of ELYZA-japanese-Llama-2-7b.

As from the blog post, this model is further trained starting from meta-llama/Llama-2-7b-chat-hf, but the training dataset is consisted of OSCAR or Wikipedia.

How did you make normal text from OSCAR or Wikipedia into instruction format, which is required by Llama-2-7b-chat-hf?

Thank you in advance.

ELYZA.inc org

Sorry for the late reply.
We did not use the instruction format for the ELYZA-japanese-Llama-2-7b pre-training, but used the text as it is in the regular pre-training.
Of course, there is a possibility of performance improvement by using instruction format, but this has not been verified at this time.

Thank you! Good to know that the model still has the ability to process natural language format

Sign up or log in to comment