Text Generation
Transformers
PyTorch
Chinese
bloom
text-generation-inference
Inference Endpoints

Why Hybrid-tuning works

#6
by toughhou - opened

Hi, here I have some confusion about the tuning

  1. what's the size of the Instruction data? It is about 67%, much larger than the pre-train data, especially compared with other LLMs.
  2. is there any difference between the instruction data training and pre-train data training, or just treat all the examples equally?
  3. is there any module like multi-task finetuning mentioned in the paper?
  4. usually the pre-train data is far larger than the instruction data, but here it's about 67%. How to make sure the instruction data is high-quality when using self-instruct and self-qa?

Sign up or log in to comment