--- datasets: - oscar-corpus/OSCAR-2301 language: - it tags: - ipt-125m --- # IPT-125m (WIP) IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset. If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞 [![Buy me a coffee](https://badgen.net/badge/icon/Buy%20Me%20A%20Coffee?icon=buymeacoffee&label)](https://bmc.link/edoardofederici) ## How to Use This model is best used with the Hugging Face `transformers` library for training and finetuning. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m") ``` ## Model Description The architecture is a modification of a standard decoder-only transformer. | Hyperparameter | Value | |----------------|-------| |n_parameters | 125M | |n_layers | 12 | | n_heads | 12 | | d_model | 768 | | vocab size | 50432 | | sequence length | 2048 |