ipt-125m / README.md
efederici's picture
Update README.md
2b81a7e
metadata
datasets:
  - oscar-corpus/OSCAR-2301
language:
  - it
tags:
  - ipt-125m

IPT-125m (WIP)

IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the OSCAR-2301 dataset.

If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞 Buy me a coffee

How to Use

This model is best used with the Hugging Face transformers library for training and finetuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")

Model Description

The architecture is a modification of a standard decoder-only transformer.

Hyperparameter Value
n_parameters 125M
n_layers 12
n_heads 12
d_model 768
vocab size 50432
sequence length 2048