afaji commited on
Commit
c8e7762
1 Parent(s): 3cf3d36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -184
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  language:
3
  - en
4
  pipeline_tag: text-generation
@@ -19,196 +20,132 @@ widget:
19
  example_title: example
20
  ---
21
 
22
- # Model Card for Model ID
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- <!-- Provide a quick summary of what the model is/does. -->
25
 
26
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
27
-
28
- ## Model Details
29
-
30
- ### Model Description
31
-
32
- <!-- Provide a longer summary of what this model is. -->
33
-
34
-
35
-
36
- - **Developed by:** [More Information Needed]
37
- - **Shared by [optional]:** [More Information Needed]
38
- - **Model type:** [More Information Needed]
39
- - **Language(s) (NLP):** [More Information Needed]
40
- - **License:** [More Information Needed]
41
- - **Finetuned from model [optional]:** [More Information Needed]
42
-
43
- ### Model Sources [optional]
44
-
45
- <!-- Provide the basic links for the model. -->
46
-
47
- - **Repository:** [More Information Needed]
48
- - **Paper [optional]:** [More Information Needed]
49
- - **Demo [optional]:** [More Information Needed]
50
-
51
- ## Uses
52
-
53
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
54
-
55
- ### Direct Use
56
-
57
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
58
-
59
- [More Information Needed]
60
-
61
- ### Downstream Use [optional]
62
-
63
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
64
-
65
- [More Information Needed]
66
-
67
- ### Out-of-Scope Use
68
-
69
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
70
-
71
- [More Information Needed]
72
-
73
- ## Bias, Risks, and Limitations
74
-
75
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
76
-
77
- [More Information Needed]
78
-
79
- ### Recommendations
80
-
81
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
82
-
83
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
84
-
85
- ## How to Get Started with the Model
86
-
87
- Use the code below to get started with the model.
88
-
89
- [More Information Needed]
90
-
91
- ## Training Details
92
-
93
- ### Training Data
94
-
95
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
96
-
97
- [More Information Needed]
98
-
99
- ### Training Procedure
100
-
101
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
102
-
103
- #### Preprocessing [optional]
104
-
105
- [More Information Needed]
106
-
107
-
108
- #### Training Hyperparameters
109
-
110
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
111
-
112
- #### Speeds, Sizes, Times [optional]
113
-
114
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
115
-
116
- [More Information Needed]
117
 
118
  ## Evaluation
 
119
 
120
- <!-- This section describes the evaluation protocols and provides the results. -->
121
-
122
- ### Testing Data, Factors & Metrics
123
-
124
- #### Testing Data
125
-
126
- <!-- This should link to a Data Card if possible. -->
127
-
128
- [More Information Needed]
129
-
130
- #### Factors
131
-
132
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
133
-
134
- [More Information Needed]
135
-
136
- #### Metrics
137
-
138
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
139
-
140
- [More Information Needed]
141
-
142
- ### Results
143
-
144
- [More Information Needed]
145
-
146
- #### Summary
147
-
148
-
149
-
150
- ## Model Examination [optional]
151
-
152
- <!-- Relevant interpretability work for the model goes here -->
153
-
154
- [More Information Needed]
155
-
156
- ## Environmental Impact
157
-
158
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
159
-
160
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
161
-
162
- - **Hardware Type:** [More Information Needed]
163
- - **Hours used:** [More Information Needed]
164
- - **Cloud Provider:** [More Information Needed]
165
- - **Compute Region:** [More Information Needed]
166
- - **Carbon Emitted:** [More Information Needed]
167
-
168
- ## Technical Specifications [optional]
169
-
170
- ### Model Architecture and Objective
171
-
172
- [More Information Needed]
173
-
174
- ### Compute Infrastructure
175
-
176
- [More Information Needed]
177
-
178
- #### Hardware
179
-
180
- [More Information Needed]
181
-
182
- #### Software
183
-
184
- [More Information Needed]
185
-
186
- ## Citation [optional]
187
-
188
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
189
-
190
- **BibTeX:**
191
-
192
- [More Information Needed]
193
-
194
- **APA:**
195
-
196
- [More Information Needed]
197
-
198
- ## Glossary [optional]
199
-
200
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
201
-
202
- [More Information Needed]
203
-
204
- ## More Information [optional]
205
-
206
- [More Information Needed]
207
 
208
- ## Model Card Authors [optional]
209
 
210
- [More Information Needed]
211
 
212
- ## Model Card Contact
213
 
214
- [More Information Needed]
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
  language:
4
  - en
5
  pipeline_tag: text-generation
 
20
  example_title: example
21
  ---
22
 
23
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
+ should probably proofread and complete it, then remove this comment. -->
25
+
26
+ <p align="center" width="100%">
27
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/LaMnin.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
28
+ </p>
29
+
30
+ # LaMini-GPT-774M
31
+
32
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
33
+
34
+ This model is one of our LaMini model series in paper "[LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini)".
35
+ This model is a fine-tuned version of [gpt2-large](https://huggingface.co/gpt2-large) on [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini/).
36
+ You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
37
+
38
+ <table>
39
+ <thead>
40
+ <tr>
41
+ <th>Base model</th>
42
+ <th colspan="4">LaMini series (#parameters)</th>
43
+ </tr>
44
+ </thead>
45
+ <tbody>
46
+ <tr>
47
+ <td>T5</td>
48
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
49
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
50
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
51
+ <td></td>
52
+ </tr>
53
+ <tr>
54
+ <td>Flan-T5</td>
55
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
56
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
57
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
58
+ <td></td>
59
+ </tr>
60
+ <tr>
61
+ <td>Cerebras-GPT</td>
62
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
63
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
64
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
65
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
66
+ </tr>
67
+ <tr>
68
+ <td>GPT-2</td>
69
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
70
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
71
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
72
+ <td></td>
73
+ </tr>
74
+ <tr>
75
+ <td>GPT-Neo</td>
76
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
77
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
78
+ <td></td>
79
+ <td></td>
80
+ </tr>
81
+ <tr>
82
+ <td>GPT-J</td>
83
+ <td colspan="4">coming soon</td>
84
+ </tr>
85
+ <tr>
86
+ <td>LLaMA</td>
87
+ <td colspan="4">coming soon</td>
88
+ </tr>
89
+
90
+
91
+ </tbody>
92
+ </table>
93
+
94
+
95
+ ## Use
96
+
97
+ ### Intended use
98
+ We recommend using the model to respond to human instructions written in natural language.
99
+ Since this decoder-only model is fine-tuned with wrapper text, we suggest using the same wrapper text to achieve the best performance.
100
+ See the example on the right or the code below.
101
+
102
+ We now show you how to load and use our model using HuggingFace `pipline()`.
103
+
104
+ ```python
105
+ # pip install -q transformers
106
+ from transformers import pipeline
107
+
108
+ checkpoint = "{model_name}"
109
+
110
+ model = pipeline('text-generation', model=checkpoint, use_auth_token=True)
111
+
112
+ instruction = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
113
+
114
+ input_prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
115
+
116
+ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
117
+
118
+ print("Response": generated_text)
119
+ ```
120
+
121
+ ## Training Procedure
122
+
123
+ <p align="center" width="100%">
124
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
125
+ </p>
126
+
127
+ We initialize with [gpt2-large](https://huggingface.co/gpt2-large) and fine-tune it on our [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 77M.
128
+
129
+ ### Training Hyperparameters
130
 
 
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
  ## Evaluation
134
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
135
 
136
+ ## Limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
+ More information needed
139
 
 
140
 
141
+ # Citation
142
 
143
+ ```bibtex
144
+ @misc{lamini,
145
+ title={LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions},
146
+ author={},
147
+ year={2023},
148
+ publisher = {GitHub},
149
+ journal = {GitHub repository},
150
+ }
151
+ ```