arxiv:2304.07987

Chinese Open Instruction Generalist: A Preliminary Release

Published on Apr 17, 2023

Upvote

Authors:

Ge Zhang ,

Ruibo Liu ,

Yizhi Li ,

Jie Fu

Abstract

Instruction tuning is widely recognized as a key technique for building generalist language models, which comes to the attention of researchers and the public with the release of InstructGPT ouyang2022training and ChatGPT [ https://chat.openai.com/ ]. Despite impressive progress in English-oriented large-scale language models (LLMs), it is still under-explored whether English-based foundation LLMs can perform similarly on multilingual tasks compared to English tasks with well-designed instruction tuning and how we can construct the corpora needed for the tuning. To remedy this gap, we propose the project as an attempt to create a Chinese instruction dataset by various methods adapted to the intrinsic characteristics of 4 sub-tasks. We collect around 200k Chinese instruction tuning samples, which have been manually checked to guarantee high quality. We also summarize the existing English and Chinese instruction corpora and brief some potential applications of the newly constructed Chinese instruction corpora.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2304.07987 in a model README.md to link it from this page.

Datasets citing this paper 4

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2304.07987 in a Space README.md to link it from this page.