File size: 1,126 Bytes
330ebca
 
 
 
 
 
 
 
 
 
 
7aa87dc
330ebca
20f8daf
ccb3ac7
 
330ebca
 
 
 
 
 
 
 
 
 
 
 
 
2b81a7e
330ebca
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
datasets:
- oscar-corpus/OSCAR-2301
language:
- it
tags:
- ipt-125m
---

# IPT-125m (WIP)

IPT-125m is a decoder-style transformer pretrained from scratch on 4.36 billion tokens of Italian text from the [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301) dataset.

If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞
[![Buy me a coffee](https://badgen.net/badge/icon/Buy%20Me%20A%20Coffee?icon=buymeacoffee&label)](https://bmc.link/edoardofederici)

## How to Use

This model is best used with the Hugging Face `transformers` library for training and finetuning.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("efederici/ipt-125m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("efederici/ipt-125m")
```

## Model Description

The architecture is a modification of a standard decoder-only transformer.

| Hyperparameter | Value |
|----------------|-------|
|n_parameters | 125M |
|n_layers | 12 |
| n_heads | 12 |
| d_model | 768 |
| vocab size | 50432 |
| sequence length | 2048 |