NanoLM-365M-Base / README.md
Mxode's picture
Update README.md
99fb990 verified
|
raw
history blame
2.84 kB
metadata
license: gpl-3.0
language:
  - en
datasets:
  - HuggingFaceTB/cosmopedia-100k
  - pleisto/wikipedia-cn-20230720-filtered
pipeline_tag: text-generation
tags:
  - text-generation-inference

NanoLM-365M-base

English | 简体中文

Introduction

Based on Qwen2-0.5B, the tokenizer has been replaced with BilingualTokenizer-8K to reduce the number of parameters. The total parameters have been reduced from 0.5B to 365M.

Details

To recover some performance and facilitate fine-tuning for downstream tasks, I chose to freeze the backbone parameters and only train the embedding part after replacing the tokenizer. Training was conducted for 40,000 steps on wikipedia-zh and cosmopedia-100k.

Value
Total Params 365 M
Trainable Params < 10 M
Trainable Parts model.embed_tokens
Training Steps 40,000
Training Dataset wikipedia-zh, cosmopedia-100k
Optimizer adamw_torch
Learning Rate 2e-4
LR Scheduler cosine
Weight Decay 0.1
Warm-up Ratio 0.03
Batch Size 16
Gradient Accumulation Steps 1
Seq Len 4096
Dtype bf16
Peak GPU Memory < 48 GB
Device NVIDIA A100-SXM4-80GB

The specific training records are as follows: result