osiria commited on
Commit
e2b05bd
β€’
1 Parent(s): 84a6275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - it
5
  ---
6
+ --------------------------------------------------------------------------------------------------
7
+
8
+ <body>
9
+ <span class="vertical-text" style="background-color:lightgreen;border-radius: 3px;padding: 3px;"> </span>
10
+ <br>
11
+ <span class="vertical-text" style="background-color:orange;border-radius: 3px;padding: 3px;">  </span>
12
+ <br>
13
+ <span class="vertical-text" style="background-color:lightblue;border-radius: 3px;padding: 3px;">    Model: PRIME 6.6B πŸ”₯</span>
14
+ <br>
15
+ <span class="vertical-text" style="background-color:tomato;border-radius: 3px;padding: 3px;">    Lang: IT</span>
16
+ <br>
17
+ <span class="vertical-text" style="background-color:lightgrey;border-radius: 3px;padding: 3px;">  </span>
18
+ <br>
19
+ <span class="vertical-text" style="background-color:#CF9FFF;border-radius: 3px;padding: 3px;"> </span>
20
+ </body>
21
+
22
+ --------------------------------------------------------------------------------------------------
23
+
24
+ <h3>Model description</h3>
25
+
26
+ This model is a <b>causal</b> language model for the <b>Italian</b> language, based on a GPT-like <b>[1]</b> architecture (more specifically, the model has been obtained by modifying Meta's XGLM architecture <b>[2]</b> and exploiting its 7.5B checkpoint).
27
+
28
+ The model has ~6.6B parameters and a vocabulary of 50.335 tokens. It is a foundation model, pre-trained for causal language modeling, so it is mainly suitable for basic natural language generation. It also has some zero-shot and few-shots inference capabilities, but you will have to fine-tune it in order to use it on more specific downstream tasks.
29
+
30
+ The released checkpoint is quantized in 8-bit, so that it can easily be loaded and used for training and inference on ordinary hardware, and it requires the installation of the <b>transformers</b> library version >= 4.30.1 and the <b>bitsandbytes</b> library, version >= 0.37.2
31
+
32
+ On Windows operating systems, the <b>bitsandbytes-windows</b> module also needs to be installed on top. However, it appears that the module is not yet updated with some recent features, like the possibility to save the 8-bit quantized models.
33
+ In order to include this, you can install the fork in [this repo](https://github.com/francesco-russo-githubber/bitsandbytes-windows), using:
34
+
35
+ ```bash
36
+ pip install git+https://github.com/francesco-russo-githubber/bitsandbytes-windows.git
37
+ ```
38
+
39
+ <h3>Quick usage</h3>
40
+
41
+ In order to use the model for inference, the following pipeline is needed:
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForCausalLM
45
+ import torch
46
+ from transformers import pipeline
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained("osiria/prime-6.6b")
49
+ model = AutoModelForCausalLM.from_pretrained("osiria/prime-6.6b")
50
+
51
+ pipeline_nlg = pipeline("text-generation", model = model, tokenizer = tokenizer)
52
+ pipeline_nlg("Ciao, mi chiamo Marco Rossi e")
53
+
54
+ # [{'generated_text': 'Ciao, mi chiamo Marco Rossi e sono un fotografo professionista.'}]
55
+ ```
56
+
57
+
58
+ <h3>Limitations</h3>
59
+
60
+ The model might behave erratically when presented with prompts which are too far away from its pre-training and, because of the probabilistic nature of its generation, it might occasionally produce biased or offensive content with respect to gender, race, ideologies, and political or religious beliefs.
61
+ These limitations imply that the model and its outputs should be used with caution, and should not be involved in situations that require the generated text to be fair or true.
62
+
63
+ <h3>References</h3>
64
+
65
+ [1] https://arxiv.org/abs/2005.14165
66
+
67
+ [2] https://arxiv.org/abs/2112.10668
68
+
69
+ <h3>License</h3>
70
+
71
+ The model is released under <b>MIT</b> license