Edit model card

TW-Pythia-6.9B-Chat

Taiwanese Mandarin Pythia Language Model, instruction-tuned for dialogue.

Version 0.2

Model Details

The TW-Pythia model is derived from the Apache-2.0-licenced Pythia language model, with 8000 new Traditional Chinese tokens added, embed layers resized and re-trained.

Basics

  • Developed by: @zetavg based on EleutherAI's Pythia language model.
  • Model type: Transformer-based GPT-NeoX Causal Language Model
  • Languages: English, Traditional Chinese
  • License: Unknown due to unconfirmed usage license of the training data
  • Derived from model: EleutherAI/pythia-6.9b

Model Sources

Uses

Currently, this model has not demonstrated any practical value in Traditional Chinese processing without further training, but it does possess some basic Chinese-English translation capabilities.

Training Details

Training Data

Training Procedure

First, we build a BPE tokenizer based on the original Pythia tokenizer with 8000 new Traditional Chinese tokens added.

Then, we resize the embedding layer of the pythia-6.9b model to accommodate the new vocabulary size, and we train only the input/output embedding layers to allow the model to learn the new Traditional Chinese words and phrases.

At last, LoRA weights are added to the model and fine-tuned for instruction following.

Training Hyperparameters

Hardware

  • 1xH100 80GB GPU on Lambda Cloud (with Skypilot), about 20h in total.
Downloads last month
25
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train twlm/tw-pythia-6.9b-chat-v0_2