Files changed (1) hide show
  1. README.md +149 -0
README.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - generated_from_trainer
5
+ - instruction fine-tuning
6
+ model-index:
7
+ - name: flan-t5-small-distil-v2
8
+ results: []
9
+ language:
10
+ - en
11
+ pipeline_tag: text2text-generation
12
+ widget:
13
+ - text: >-
14
+ how can I become more healthy?
15
+ example_title: example
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ <p align="center" width="100%">
22
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/LaMnin.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
23
+ </p>
24
+
25
+ # LaMini-FLAN-T5-783M
26
+
27
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
28
+
29
+ This model is one of our LaMini model series in paper "[LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini)". This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini/).
30
+ You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
31
+
32
+ <table>
33
+ <thead>
34
+ <tr>
35
+ <th>Base model</th>
36
+ <th colspan="4">LaMini series (#parameters)</th>
37
+ </tr>
38
+ </thead>
39
+ <tbody>
40
+ <tr>
41
+ <td>T5</td>
42
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
43
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
44
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
45
+ <td></td>
46
+ </tr>
47
+ <tr>
48
+ <td>Flan-T5</td>
49
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
50
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
51
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
52
+ <td></td>
53
+ </tr>
54
+ <tr>
55
+ <td>Cerebras-GPT</td>
56
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
57
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
58
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
59
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
60
+ </tr>
61
+ <tr>
62
+ <td>GPT-2</td>
63
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
64
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
65
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
66
+ <td></td>
67
+ </tr>
68
+ <tr>
69
+ <td>GPT-Neo</td>
70
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
71
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
72
+ <td></td>
73
+ <td></td>
74
+ </tr>
75
+ <tr>
76
+ <td>GPT-J</td>
77
+ <td colspan="4">coming soon</td>
78
+ </tr>
79
+ <tr>
80
+ <td>LLaMA</td>
81
+ <td colspan="4">coming soon</td>
82
+ </tr>
83
+
84
+
85
+ </tbody>
86
+ </table>
87
+
88
+
89
+ ## Use
90
+
91
+ ### Intended use
92
+ We recommend using the model to response to human instructions written in natural language.
93
+
94
+ We now show you how to load and use our model using HuggingFace `pipline()`.
95
+
96
+ ```python
97
+ # pip install -q transformers
98
+ from transformers import pipeline
99
+
100
+ checkpoint = "{model_name}"
101
+
102
+ model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
103
+
104
+ input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
105
+ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
106
+
107
+ print("Response": generated_text)
108
+ ```
109
+
110
+ ## Training Procedure
111
+
112
+ <p align="center" width="100%">
113
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
114
+ </p>
115
+
116
+ We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 77M.
117
+
118
+ ### Training Hyperparameters
119
+
120
+ The following hyperparameters were used during training:
121
+ - learning_rate: 0.0005
122
+ - train_batch_size: 128
123
+ - eval_batch_size: 64
124
+ - seed: 42
125
+ - gradient_accumulation_steps: 4
126
+ - total_train_batch_size: 512
127
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
128
+ - lr_scheduler_type: linear
129
+ - num_epochs: 5
130
+
131
+ ## Evaluation
132
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
133
+
134
+ ## Limitations
135
+
136
+ More information needed
137
+
138
+
139
+ # Citation
140
+
141
+ ```bibtex
142
+ @misc{lamini,
143
+ title={LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions},
144
+ author={},
145
+ year={2023},
146
+ publisher = {GitHub},
147
+ journal = {GitHub repository},
148
+ }
149
+ ```