efederici commited on
Commit
af578b1
1 Parent(s): eab57a0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - ipt
5
+ - alibi
6
+ inference: false
7
+ datasets:
8
+ - oscar-corpus/OSCAR-2301
9
+ language:
10
+ - it
11
+ ---
12
+
13
+ # ipt-350m
14
+
15
+ ipt-350m is a decoder-style transformer pretrained from scratch on ~13B tokens of Italian text.
16
+
17
+ It uses a modified transformer architecture optimized for efficient training and inference. Positional embeddings are replaced with Attention with Linear Biases ([ALiBi](https://arxiv.org/abs/2108.12409)).
18
+
19
+ ipt-350m is:
20
+ - **Licensed for the possibility of commercial use**
21
+ - **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409).
22
+ - **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
23
+ - **Equipped with highly efficient open-source training code** via the [llm-foundry repository](https://github.com/mosaicml/llm-foundry)
24
+
25
+ ## How to Use
26
+
27
+ ```python
28
+ import transformers
29
+ model = transformers.AutoModelForCausalLM.from_pretrained(
30
+ 'efederici/ipt-350m-alibi',
31
+ trust_remote_code=True
32
+ )
33
+ ```
34
+ Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
35
+
36
+ To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
37
+ ```python
38
+ import torch
39
+ import transformers
40
+
41
+ name = 'efederici/ipt-350m-alibi'
42
+
43
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
44
+ config.attn_config['attn_impl'] = 'triton'
45
+ config.init_device = 'cuda:0'
46
+
47
+ model = transformers.AutoModelForCausalLM.from_pretrained(
48
+ name,
49
+ config=config,
50
+ torch_dtype=torch.bfloat16,
51
+ trust_remote_code=True
52
+ )
53
+ ```
54
+
55
+ Although the model was trained with a sequence length of 2048, ALiBi enables to increase the maximum sequence length during finetuning and/or inference.
56
+
57
+ ```python
58
+ import transformers
59
+
60
+ name = 'efederici/ipt-350m-alibi'
61
+
62
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
63
+ config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
64
+
65
+ model = transformers.AutoModelForCausalLM.from_pretrained(
66
+ name,
67
+ config=config,
68
+ trust_remote_code=True
69
+ )
70
+ ```
71
+
72
+ ## Model Description
73
+
74
+ The architecture is a modification of a standard decoder-only transformer.
75
+
76
+ The model has been modified from a standard transformer in the following ways:
77
+ - It uses [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf)
78
+ - It uses [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409) and does not use positional embeddings
79
+ - It does not use biases
80
+
81
+ | Hyperparameter | Value |
82
+ |----------------|-------|
83
+ |n_parameters | 350M |
84
+ |n_layers | 24 |
85
+ | n_heads | 16 |
86
+ | d_model | 1024 |
87
+ | vocab size | 50432 |
88
+ | sequence length | 2048 |
89
+
90
+ ### Dataset
91
+
92
+ The model was trained for ~13B tokens (with batch size 64 and sequence length 2048) on [OSCAR-2301](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301).
93
+ Each example was constructed from as many sequences from that dataset as were necessary to fill the 2048 sequence length.
94
+
95
+ Vocabulary size is 50432, a multiple of 128 as suggested in [MEGATRON-LM](https://arxiv.org/abs/1909.08053), model flop utilization (MFU) increased by up to four percentage points.
96
+
97
+ If you like this project, consider supporting me with a cup of coffee! 🤖✨🌞
98
+ [![Buy me a coffee](https://badgen.net/badge/icon/Buy%20Me%20A%20Coffee?icon=buymeacoffee&label)](https://bmc.link/edoardofederici)