Edit model card

stockmark/gpt-neox-japanese-1.4b

This repository provides a GPT-NeoX based model with 1.4B parameters pre-trained on Japanese corpus of about 20B tokens. This model is developed by Stockmark Inc.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Use torch.bfloat16 for A100 GPU and torch.flaot16 for the older generation GPUs
torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "is_bf16_supported") and torch.cuda.is_bf16_supported() else torch.float16

model = AutoModelForCausalLM.from_pretrained("stockmark/gpt-neox-japanese-1.4b", device_map="auto", torch_dtype=torch_dtype)
tokenizer = AutoTokenizer.from_pretrained("stockmark/gpt-neox-japanese-1.4b")

inputs = tokenizer("鑷劧瑷瑾炲嚘鐞嗐伅", return_tensors="pt").to(model.device)
with torch.no_grad():
    tokens = model.generate(
        **inputs,
        max_new_tokens=128,
        repetition_penalty=1.1
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)

Example:

Training dataset

  • Japanese Web Corpus (ja): 8.6B tokens (This dataset will not be released.)
  • Wikipedia (ja): 0.88B tokens
  • CC100 (ja): 10.5B tokens

Training setting

  • Trained using HuggingFace Trainer and DeepSpeed (ZeRO-2)
  • 8 A100 GPUs (40GB) at ABCI
  • Mixed Precision (BF16)

License

The MIT license

Developed by

Stockmark Inc.

Author

Takahiro Omi

Downloads last month
912
Safetensors
Model size
1.44B params
Tensor type
F32
BF16
BOOL
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using stockmark/gpt-neox-japanese-1.4b 1