Triangle104 commited on
Commit
2661ae1
·
verified ·
1 Parent(s): b7773b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +221 -0
README.md CHANGED
@@ -16,6 +16,227 @@ tags:
16
  This model was converted to GGUF format from [`prithivMLmods/GWQ2b`](https://huggingface.co/prithivMLmods/GWQ2b) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/prithivMLmods/GWQ2b) for more details on the model.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Use with llama.cpp
20
  Install llama.cpp through brew (works on Mac and Linux)
21
 
 
16
  This model was converted to GGUF format from [`prithivMLmods/GWQ2b`](https://huggingface.co/prithivMLmods/GWQ2b) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/prithivMLmods/GWQ2b) for more details on the model.
18
 
19
+ ---
20
+ Model details:
21
+ -
22
+ GWQ2b is a family of lightweight, state-of-the-art open models from
23
+ Google, built using the same research and technology employed to create
24
+ the Gemini models. These models are text-to-text, decoder-only large
25
+ language models, available in English, with open weights for both
26
+ pre-trained and instruction-tuned variants. GWQ2b models are well-suited
27
+ for a variety of text generation tasks, including question answering,
28
+ summarization, and reasoning. GWQ2b is fine-tuned on the Chain of
29
+ Continuous Thought Synthetic Dataset, built upon the Gemma2forCasualLM
30
+ architecture.
31
+
32
+
33
+
34
+
35
+
36
+
37
+
38
+ Running GWQ2b Demo
39
+
40
+
41
+
42
+
43
+ # pip install accelerate
44
+ from transformers import AutoTokenizer, AutoModelForCausalLM
45
+ import torch
46
+
47
+ tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/GWQ2b")
48
+ model = AutoModelForCausalLM.from_pretrained(
49
+ "prithivMLmods/GWQ2b",
50
+ device_map="auto",
51
+ torch_dtype=torch.bfloat16,
52
+ )
53
+
54
+ input_text = "Write me a poem about Machine Learning."
55
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
56
+
57
+ outputs = model.generate(**input_ids, max_new_tokens=32)
58
+ print(tokenizer.decode(outputs[0]))
59
+
60
+
61
+
62
+ You can ensure the correct chat template is applied by using tokenizer.apply_chat_template as follows:
63
+
64
+
65
+ messages = [
66
+ {"role": "user", "content": "Write me a poem about Machine Learning."},
67
+ ]
68
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
69
+
70
+ outputs = model.generate(**input_ids, max_new_tokens=256)
71
+ print(tokenizer.decode(outputs[0]))
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+
80
+ Key Architecture
81
+
82
+
83
+
84
+
85
+ Transformer-Based Design:
86
+ GWQ2b leverages the
87
+ transformer architecture, utilizing self-attention mechanisms to
88
+ process input text and capture contextual relationships effectively.
89
+
90
+
91
+ Lightweight and Efficient:
92
+ It is designed to
93
+ be computationally efficient, with fewer parameters compared to larger
94
+ models, making it ideal for deployment on resource-constrained devices
95
+ or environments.
96
+
97
+
98
+ Modular Layers:
99
+ The architecture consists of
100
+ modular encoder and decoder layers, allowing flexibility in adapting the
101
+ model for specific tasks like text generation, summarization, or
102
+ classification.
103
+
104
+
105
+ Attention Mechanisms:
106
+ GWQ2b employs
107
+ multi-head self-attention to focus on relevant parts of the input text,
108
+ improving its ability to handle long-range dependencies and complex
109
+ language structures.
110
+
111
+
112
+ Pre-training and Fine-Tuning:
113
+ The model is
114
+ pre-trained on large text corpora and can be fine-tuned for specific
115
+ tasks, such as markdown processing in ReadM.Md, to enhance its
116
+ performance on domain-specific data.
117
+
118
+
119
+ Scalability:
120
+ The architecture supports scaling up or down based on the application's requirements, balancing performance and resource usage.
121
+
122
+
123
+ Open-Source and Customizable:
124
+ Being
125
+ open-source, GWQ2b allows developers to modify and extend its
126
+ architecture to suit specific use cases, such as integrating it into
127
+ tools like ReadM.Md for markdown-related tasks.
128
+
129
+
130
+
131
+
132
+
133
+
134
+
135
+
136
+
137
+ Intended Use of GWQ2b (Gemma with Questions2b)
138
+
139
+
140
+
141
+
142
+ Question Answering:
143
+ The model excels in generating concise and relevant answers to user-provided queries across various domains.
144
+
145
+
146
+ Summarization:
147
+ It can be used to summarize
148
+ large bodies of text, making it suitable for news aggregation, academic
149
+ research, and report generation.
150
+
151
+
152
+ Reasoning Tasks:
153
+ GWQ2b is fine-tuned on the
154
+ Chain of Continuous Thought Synthetic Dataset, which enhances its
155
+ ability to perform reasoning, multi-step problem solving, and logical
156
+ inferences.
157
+
158
+
159
+ Text Generation:
160
+ The model is ideal for
161
+ creative writing tasks such as generating poems, stories, and essays. It
162
+ can also be used for generating code comments, documentation, and
163
+ markdown files.
164
+
165
+
166
+ Instruction Following:
167
+ GWQ2b’s
168
+ instruction-tuned variant is suitable for generating responses based on
169
+ user instructions, making it useful for virtual assistants, tutoring
170
+ systems, and automated customer support.
171
+
172
+
173
+ Domain-Specific Applications:
174
+ Thanks to its
175
+ modular design and open-source nature, the model can be fine-tuned for
176
+ specific tasks like legal document summarization, medical record
177
+ analysis, or financial report generation.
178
+
179
+
180
+
181
+
182
+
183
+
184
+
185
+
186
+
187
+ Limitations of GWQ2b
188
+
189
+
190
+
191
+
192
+ Resource Requirements:
193
+ Although lightweight
194
+ compared to larger models, the 9B parameter size still requires
195
+ significant computational resources, including GPUs with large memory
196
+ for inference.
197
+
198
+
199
+ Knowledge Cutoff:
200
+ The model’s pre-training
201
+ data may not include recent information, making it less effective for
202
+ answering queries on current events or newly developed topics.
203
+
204
+
205
+ Bias in Outputs:
206
+ Since the model is trained
207
+ on publicly available datasets, it may inherit biases present in those
208
+ datasets, leading to potentially biased or harmful outputs in sensitive
209
+ contexts.
210
+
211
+
212
+ Hallucinations:
213
+ Like other large language
214
+ models, GWQ2b can occasionally generate incorrect or nonsensical
215
+ information, especially when asked for facts or reasoning outside its
216
+ training scope.
217
+
218
+
219
+ Lack of Common-Sense Reasoning:
220
+ While GWQ2b
221
+ is fine-tuned for reasoning, it may still struggle with tasks requiring
222
+ deep common-sense knowledge or nuanced understanding of human behavior
223
+ and emotions.
224
+
225
+
226
+ Dependency on Fine-Tuning:
227
+ For optimal
228
+ performance on domain-specific tasks, fine-tuning on relevant datasets
229
+ is required, which demands additional computational resources and
230
+ expertise.
231
+
232
+
233
+ Context Length Limitation:
234
+ The model’s
235
+ ability to process long documents is limited by its maximum context
236
+ window size. If the input exceeds this limit, truncation may lead to
237
+ loss of important information.
238
+
239
+ ---
240
  ## Use with llama.cpp
241
  Install llama.cpp through brew (works on Mac and Linux)
242