Upload README.md
Browse files
README.md
CHANGED
@@ -6,33 +6,20 @@ license: apache-2.0
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
-
BLING-
|
10 |
|
11 |
-
BLING models are
|
12 |
-
|
13 |
-
BLING models are fine-tuned with distilled high-quality custom instruct datasets, targeted at a specific subset of instruct tasks with the objective of providing a high-quality Instruct model that can be run entirely without a GPU server, with good quality instruct-following capability that can be loaded and run locally on a laptop.
|
14 |
-
|
15 |
-
## Model Details
|
16 |
|
17 |
### Model Description
|
18 |
|
19 |
<!-- Provide a longer summary of what this model is. -->
|
20 |
|
21 |
- **Developed by:** llmware
|
22 |
-
- **Shared by [optional]:** Darren Oberst
|
23 |
- **Model type:** GPTNeoX instruct-trained decoder
|
24 |
- **Language(s) (NLP):** English
|
25 |
- **License:** Apache 2.0
|
26 |
- **Finetuned from model [optional]:** EleutherAI/Pythia-1b-deduped
|
27 |
|
28 |
-
### Model Sources [optional]
|
29 |
-
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
## Uses
|
37 |
|
38 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
@@ -42,184 +29,82 @@ The intended use of BLING models is two-fold:
|
|
42 |
1. Provide a high-quality Instruct models that can run on a laptop for local testing. We have found it extremely useful when building a
|
43 |
proof-of-concept, or working with sensitive enterprise data that must be closely guarded, especially in RAG use cases.
|
44 |
|
45 |
-
2. Push the state of the art for smaller Instruct-following models in the 1B - 7B range.
|
|
|
46 |
|
47 |
### Direct Use
|
48 |
|
49 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
50 |
|
51 |
BLING is designed for enterprise automation use cases, especially in knowledge-intensive industries, such as financial services,
|
52 |
-
legal and regulatory industries.
|
53 |
-
RAG automation tasks with complex information sources. Rather than try to be "all things to all people," BLING models try to focus
|
54 |
-
on a narrower set of Instructions more suitable to a ~1B parameter GPT model.
|
55 |
|
56 |
BLING is ideal for rapid prototyping, testing, and the ability to perform an end-to-end workflow locally on a laptop without
|
57 |
having to send sensitive information over an Internet-based API.
|
58 |
|
|
|
59 |
|
60 |
-
[More Information Needed]
|
61 |
-
|
62 |
-
### Downstream Use [optional]
|
63 |
-
|
64 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
65 |
-
|
66 |
-
[More Information Needed]
|
67 |
|
68 |
### Out-of-Scope Use
|
69 |
|
70 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
71 |
|
72 |
-
1. BLING is not designed for 'chat-bot' or 'consumer-oriented' applications.
|
73 |
|
74 |
2. BLING is not optimal for most production applications, other than simple and highly specific use cases.
|
75 |
|
76 |
|
77 |
-
[More Information Needed]
|
78 |
-
|
79 |
## Bias, Risks, and Limitations
|
80 |
|
81 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
82 |
|
83 |
-
BLING has not been designed for end consumer-oriented applications, and there has been any focus in training on
|
84 |
-
mitigate potential bias and safety. We would strongly discourage any use of BLING for any 'chatbot' use case.
|
85 |
-
|
86 |
-
[More Information Needed]
|
87 |
-
|
88 |
-
### Recommendations
|
89 |
|
90 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
91 |
-
|
92 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
93 |
|
94 |
## How to Get Started with the Model
|
95 |
|
96 |
-
|
97 |
-
|
98 |
-
[More Information Needed]
|
99 |
-
|
100 |
-
## Training Details
|
101 |
-
|
102 |
-
### Training Data
|
103 |
-
|
104 |
-
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
105 |
-
|
106 |
-
[More Information Needed]
|
107 |
-
|
108 |
-
### Training Procedure
|
109 |
-
|
110 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
111 |
-
|
112 |
-
#### Preprocessing [optional]
|
113 |
-
|
114 |
-
[More Information Needed]
|
115 |
-
|
116 |
-
|
117 |
-
#### Training Hyperparameters
|
118 |
-
|
119 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
120 |
-
|
121 |
-
#### Speeds, Sizes, Times [optional]
|
122 |
-
|
123 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
## Evaluation
|
128 |
-
|
129 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
130 |
|
131 |
-
|
132 |
|
133 |
-
|
134 |
|
135 |
-
|
136 |
|
137 |
-
[More Information Needed]
|
138 |
|
139 |
-
|
140 |
|
141 |
-
|
142 |
|
143 |
-
|
144 |
|
145 |
-
|
|
|
146 |
|
147 |
-
|
148 |
|
149 |
-
|
150 |
|
151 |
-
### Results
|
152 |
-
|
153 |
-
[More Information Needed]
|
154 |
-
|
155 |
-
#### Summary
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
## Model Examination [optional]
|
160 |
-
|
161 |
-
<!-- Relevant interpretability work for the model goes here -->
|
162 |
-
|
163 |
-
[More Information Needed]
|
164 |
-
|
165 |
-
## Environmental Impact
|
166 |
-
|
167 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
168 |
-
|
169 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
170 |
-
|
171 |
-
- **Hardware Type:** [More Information Needed]
|
172 |
-
- **Hours used:** [More Information Needed]
|
173 |
-
- **Cloud Provider:** [More Information Needed]
|
174 |
-
- **Compute Region:** [More Information Needed]
|
175 |
-
- **Carbon Emitted:** [More Information Needed]
|
176 |
-
|
177 |
-
## Technical Specifications [optional]
|
178 |
-
|
179 |
-
### Model Architecture and Objective
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
### Compute Infrastructure
|
184 |
-
|
185 |
-
[More Information Needed]
|
186 |
-
|
187 |
-
#### Hardware
|
188 |
-
|
189 |
-
[More Information Needed]
|
190 |
-
|
191 |
-
#### Software
|
192 |
-
|
193 |
-
[More Information Needed]
|
194 |
|
195 |
## Citation [optional]
|
196 |
|
197 |
-
|
198 |
-
|
199 |
-
**BibTeX:**
|
200 |
-
|
201 |
-
[More Information Needed]
|
202 |
|
203 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
204 |
|
205 |
-
[More Information Needed]
|
206 |
|
207 |
-
##
|
208 |
-
|
209 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
210 |
-
|
211 |
-
[More Information Needed]
|
212 |
-
|
213 |
-
## More Information [optional]
|
214 |
-
|
215 |
-
[More Information Needed]
|
216 |
-
|
217 |
-
## Model Card Authors [optional]
|
218 |
|
219 |
-
|
220 |
|
221 |
-
|
222 |
|
223 |
-
[More Information Needed]
|
224 |
|
225 |
|
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
+
BLING-1.4b-0.1 is the first model release in the BLING ("Best Little Instruction-following No-GPU-required") model series, designed as custom instruct-following laptop-effective GPT decoder-based models (~1B-2.7B parameters).
|
10 |
|
11 |
+
BLING models are fine-tuned with distilled high-quality custom instruct datasets, targeted at a specific subset of instruct tasks with the objective of providing a high-quality Instruct model that can be run entirely without a GPU server, with good quality instruct-following capability that can be loaded and run locally on a laptop even without using any quantization optimizations.
|
|
|
|
|
|
|
|
|
12 |
|
13 |
### Model Description
|
14 |
|
15 |
<!-- Provide a longer summary of what this model is. -->
|
16 |
|
17 |
- **Developed by:** llmware
|
|
|
18 |
- **Model type:** GPTNeoX instruct-trained decoder
|
19 |
- **Language(s) (NLP):** English
|
20 |
- **License:** Apache 2.0
|
21 |
- **Finetuned from model [optional]:** EleutherAI/Pythia-1b-deduped
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
## Uses
|
24 |
|
25 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|
|
29 |
1. Provide a high-quality Instruct models that can run on a laptop for local testing. We have found it extremely useful when building a
|
30 |
proof-of-concept, or working with sensitive enterprise data that must be closely guarded, especially in RAG use cases.
|
31 |
|
32 |
+
2. Push the state of the art for smaller Instruct-following models in the 1B - 7B range through improved fine-tuning datasets and targeted "instruction" tasks.
|
33 |
+
|
34 |
|
35 |
### Direct Use
|
36 |
|
37 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
38 |
|
39 |
BLING is designed for enterprise automation use cases, especially in knowledge-intensive industries, such as financial services,
|
40 |
+
legal and regulatory industries with complex information sources. Rather than try to be "all things to all people," BLING models try to focus on a narrower set of Instructions more suitable to a ~1B parameter GPT model.
|
|
|
|
|
41 |
|
42 |
BLING is ideal for rapid prototyping, testing, and the ability to perform an end-to-end workflow locally on a laptop without
|
43 |
having to send sensitive information over an Internet-based API.
|
44 |
|
45 |
+
The first BLING models have been trained on question-answering, key-value extraction, and basic summarization as the core instruction types.
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
### Out-of-Scope Use
|
49 |
|
50 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
51 |
|
52 |
+
1. BLING is not designed for 'chat-bot' or 'consumer-oriented' applications.
|
53 |
|
54 |
2. BLING is not optimal for most production applications, other than simple and highly specific use cases.
|
55 |
|
56 |
|
|
|
|
|
57 |
## Bias, Risks, and Limitations
|
58 |
|
59 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
60 |
|
61 |
+
BLING has not been designed for end consumer-oriented applications, and there has been any focus in training on safeguards to mitigate potential bias. We would strongly discourage any use of BLING for any 'chatbot' use case.
|
|
|
|
|
|
|
|
|
|
|
62 |
|
|
|
|
|
|
|
63 |
|
64 |
## How to Get Started with the Model
|
65 |
|
66 |
+
The fastest way to get started with BLING is through direct import in transformers:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
69 |
|
70 |
+
tokenizer = AutoTokenizer.from_pretrained("llmware/bling-1.4b-0.1")
|
71 |
|
72 |
+
model = AutoModelForCausalLM.from_pretrained("llmware/bling-1.4b-0.1")
|
73 |
|
|
|
74 |
|
75 |
+
The BLING model was fine-tuned with a simple "<human> and <bot> wrapper", so to get the best results, wrap inference entries as:
|
76 |
|
77 |
+
full_prompt = "<human>: " + my_prompt + "\n" + "<bot>: "
|
78 |
|
79 |
+
The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
|
80 |
|
81 |
+
1. Text Passage Context, and
|
82 |
+
2. Specific question or instruction based on the text passage
|
83 |
|
84 |
+
To get the best results, package "my_prompt" as follows:
|
85 |
|
86 |
+
my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
|
87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
## Citation [optional]
|
90 |
|
91 |
+
BLING models are built on top of EleutherAI/Pythia base - please see citation for Pythia below:
|
|
|
|
|
|
|
|
|
92 |
|
93 |
+
@misc{biderman2023pythia,
|
94 |
+
title={Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling},
|
95 |
+
author={Stella Biderman and Hailey Schoelkopf and Quentin Anthony and Herbie Bradley and Kyle O'Brien and Eric Hallahan and Mohammad Aflah Khan and Shivanshu Purohit and USVSN Sai Prashanth and Edward Raff and Aviya Skowron and Lintang Sutawika and Oskar van der Wal},
|
96 |
+
year={2023},
|
97 |
+
eprint={2304.01373},
|
98 |
+
archivePrefix={arXiv},
|
99 |
+
primaryClass={cs.CL}
|
100 |
+
}
|
101 |
|
|
|
102 |
|
103 |
+
## Model Card Contact
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
+
Darren Oberst & llmware team
|
106 |
|
107 |
+
Please reach out anytime if you are interested in this project and would like to participate and work with us!
|
108 |
|
|
|
109 |
|
110 |
|