arxiv:2502.06635

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

Published on Feb 10

· Submitted by

aaabiao on Feb 11

Upvote

Authors:

Tianyu Zheng ,

Abstract

Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available at https://github.com/zhanshijinwat/Steel-LLM.

View arXiv page View PDF Add to collection

Community

aaabiao

Paper author Paper submitter 3 days ago

Introducing Steel-LLM: A Fully Open-Source, Resource-Efficient Chinese-Centric Language Model

Discover Steel-LLM, a groundbreaking 1-billion-parameter language model developed with limited computational resources (just 8 GPUs) and a commitment to full transparency. Launched in March 2024, Steel-LLM is designed to bridge the gap in open-source LLMs by focusing on Chinese language data while incorporating a small portion of English.

Whether you're a small research team or an individual practitioner, Steel-LLM provides practical guidance and detailed insights into model development, making it an invaluable resource for the LLM community.

Join us in advancing open-source AI. Explore Steel-LLM today:
https://github.com/zhanshijinwat/Steel-LLM

librarian-bot

2 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.06635 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.06635 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.06635 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.