Model Card for multiscreen_psi16_768

This model is an unofficial experimental pre-traind model of multiscreen with TinyStories datasets. It has been trained using TRL.

Quick start

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "kurogane/multiscreen_154M_tinystorys_vocab768_instruct"
cache_dir = r"/media/kurogane/backup/cache"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True,
    cache_dir=cache_dir,
    )
model.to("cuda:0")

tokenizer = AutoTokenizer.from_pretrained(
    model_id, 
    padding_side="left", 
    cache_dir=cache_dir,
    )

messages = [
    {"role": "user", "content": "Write a short story about a helpful robot."}
]

model_inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    return_dict=True,
    add_generation_prompt=True,
).to(model.device)

generated_ids = model.generate(**model_inputs)

s_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(s_output)

result example

Once upon a time, there was a robot named John who had a pro

Training procedure

This model was trained with SFT.

Framework versions

  • TRL: 0.24.0
  • Transformers: 5.8.0
  • Pytorch: 2.11.0+cu129
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Used archtechture

This model is an experimental tiny language model trained on TinyStories using a Multiscreen-style architecture inspired by the paper Screening Is Enough by Ken M. Nakanishi. This model implementation was developed as an experimental Hugging Face Transformers port, with reference to the unofficial PyTorch implementation dieOD/multiscreen-pytorch. This model is not an official implementation released by the author of the Multiscreen paper.

Used dataset

The training data is based on the TinyStories dataset by Ronen Eldan and Yuanzhi Li.

On this instruct tuning stage, I used tatsu-lab/alpaca dataset as training data.

Downloads last month
124
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kurogane/multiscreen_154M_tinystorys_vocab768_instruct

Finetuned
(1)
this model

Datasets used to train kurogane/multiscreen_154M_tinystorys_vocab768_instruct

Collection including kurogane/multiscreen_154M_tinystorys_vocab768_instruct

Papers for kurogane/multiscreen_154M_tinystorys_vocab768_instruct