IPT-125m (WIP)

IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the OSCAR-2301 dataset.

If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞

How to Use

This model is best used with the Hugging Face transformers library for training and finetuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")

Model Description

The architecture is a modification of a standard decoder-only transformer.

Hyperparameter	Value
n_parameters	125M
n_layers	12
n_heads	12
d_model	768
vocab size	50432
sequence length	2048

efederici
/

ipt-125m

IPT-125m (WIP)

How to Use

Model Description

Dataset used to train efederici/ipt-125m