huashiyiqike commited on
Commit
1ac5d24
1 Parent(s): 71392a1

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,188 @@
1
  ---
2
- license: openrail
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-sa-4.0
3
+ datasets:
4
+ - tatsu-lab/alpaca
5
+ - the_pile
6
  ---
7
+
8
+ # Model Card for Cerebras 111M Dollyfied.
9
+
10
+ This is a finetuned model of Cerebras 111M model. using DataBricksLabs Dolly Framework
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ This is a finetuned version of cerebras' 111million paramater model that has been trained to follow instructions.
17
+
18
+ It was accomplished using DataBricks Dolly training tools and the alpaca dataset, and was trained for 2 epochs.
19
+
20
+ - **Developed by:** Finetuned by Corianas (me) using open source tools
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** EN
24
+ - **License:** cc-by-nc-4.0
25
+ - **Finetuned from model:** https://huggingface.co/cerebras/Cerebras-GPT-111m
26
+ - **Finetuned using:** https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
27
+
28
+ ## Uses
29
+
30
+ This is a simple GPT chatbot that has been finetuned to understand instructions.
31
+ Its knowledge about facts about the world is should be considered suspect at best.
32
+
33
+ ### Direct Use
34
+
35
+ If you have a use you put it to, Please let me know.
36
+
37
+ [More Information Needed]
38
+
39
+ ### Downstream Use [optional]
40
+
41
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Out-of-Scope Use
46
+
47
+ Any form of use where any form of accuracy is needed.
48
+ FOR THE LOVE OF GOD DO NOT FOLLOW MEDICAL ADVICE FROM THIS.
49
+ or financial advice.
50
+
51
+ [More Information Needed]
52
+
53
+ ## Bias, Risks, and Limitations
54
+
55
+ Limitations... Yes, I am sure there are so so many.
56
+
57
+ [More Information Needed]
58
+
59
+ ## How to Get Started with the Model
60
+
61
+ Use the code below to get started with the model.
62
+
63
+ [More Information Needed]
64
+
65
+ ## Training Details
66
+
67
+ ### Training Data
68
+
69
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
70
+
71
+ [More Information Needed]
72
+
73
+ ### Training Procedure
74
+
75
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
76
+
77
+ #### Preprocessing [optional]
78
+
79
+ [More Information Needed]
80
+
81
+
82
+ #### Training Hyperparameters
83
+
84
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
85
+
86
+ #### Speeds, Sizes, Times [optional]
87
+
88
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
89
+
90
+ [More Information Needed]
91
+
92
+ ## Evaluation
93
+
94
+ <!-- This section describes the evaluation protocols and provides the results. -->
95
+
96
+ ### Testing Data, Factors & Metrics
97
+
98
+ #### Testing Data
99
+
100
+ <!-- This should link to a Data Card if possible. -->
101
+
102
+ [More Information Needed]
103
+
104
+ #### Factors
105
+
106
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
107
+
108
+ [More Information Needed]
109
+
110
+ #### Metrics
111
+
112
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
113
+
114
+ [More Information Needed]
115
+
116
+ ### Results
117
+
118
+ [More Information Needed]
119
+
120
+ #### Summary
121
+
122
+
123
+
124
+ ## Model Examination [optional]
125
+
126
+ <!-- Relevant interpretability work for the model goes here -->
127
+
128
+ [More Information Needed]
129
+
130
+ ## Environmental Impact
131
+
132
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
133
+
134
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
135
+
136
+ - **Hardware Type:** 8xA100s (accomplished while I was downloading the model I was actually training.)
137
+ - **Minutes used:** 7.5
138
+ - **Cloud Provider:** LambdaGPU
139
+ - **Compute Region:** USA
140
+ - **Carbon Emitted:** [More Information Needed]
141
+
142
+ ## Technical Specifications [optional]
143
+
144
+ ### Model Architecture and Objective
145
+
146
+ [More Information Needed]
147
+
148
+ ### Compute Infrastructure
149
+
150
+ [More Information Needed]
151
+
152
+ #### Hardware
153
+
154
+ [More Information Needed]
155
+
156
+ #### Software
157
+
158
+ [More Information Needed]
159
+
160
+ ## Citation [optional]
161
+
162
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
163
+
164
+ **BibTeX:**
165
+
166
+ [More Information Needed]
167
+
168
+ **APA:**
169
+
170
+ [More Information Needed]
171
+
172
+ ## Glossary [optional]
173
+
174
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
175
+
176
+ [More Information Needed]
177
+
178
+ ## More Information [optional]
179
+
180
+ [More Information Needed]
181
+
182
+ ## Model Card Authors [optional]
183
+
184
+ [More Information Needed]
185
+
186
+ ## Model Card Contact
187
+
188
+ [More Information Needed]
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "cerebras/Cerebras-GPT-111M",
3
+ "activation_function": "gelu",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.0,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.0,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_embd": 768,
15
+ "n_head": 12,
16
+ "n_inner": 3072,
17
+ "n_layer": 10,
18
+ "n_positions": 2048,
19
+ "reorder_and_upcast_attn": false,
20
+ "resid_pdrop": 0.0,
21
+ "scale_attn_by_inverse_layer_idx": false,
22
+ "scale_attn_weights": true,
23
+ "summary_activation": null,
24
+ "summary_first_dropout": 0.1,
25
+ "summary_proj_to_labels": true,
26
+ "summary_type": "cls_index",
27
+ "summary_use_proj": true,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.25.1",
30
+ "use_cache": false,
31
+ "vocab_size": 50257
32
+ }
huggingface-metadata.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ url: https://huggingface.co/Corianas/111m
2
+ branch: main
3
+ download date: 2023-08-22 16:42:49
4
+ sha256sum:
5
+ 9d8e0e14dee78548497a1460e94174a59137334a7b5543e0714f7e4179631d47 model.safetensors
6
+ e11b2e73d6ea7b3f9a0d11cc81db9fcd618c517288103d47fe842e517bf9aaa5 pytorch_model.bin
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d8e0e14dee78548497a1460e94174a59137334a7b5543e0714f7e4179631d47
3
+ size 264058626
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "name_or_path": "cerebras/Cerebras-GPT-111M",
23
+ "pad_token": null,
24
+ "special_tokens_map_file": null,
25
+ "tokenizer_class": "GPT2Tokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff