mksethi commited on
Commit
112ba43
1 Parent(s): 6af93e1

End of training

Browse files
Files changed (1) hide show
  1. README.md +52 -137
README.md CHANGED
@@ -11,154 +11,69 @@ datasets:
11
  model-index:
12
  - name: gemma-2b-dolly-qa
13
  results: []
14
- language:
15
- - en
16
  ---
17
- ---
18
-
19
-
20
-
21
-
22
-
23
-
24
-
25
- # Model Card for Khalsa
26
-
27
- <!-- Provide a quick summary of what the model is/does. [Optional] -->
28
- Fine-tuned Gemma Model which was worked on using the intel developer cloud, and trained on using Intel Max 1550 GPU
29
-
30
-
31
- # Model Details
32
-
33
- ## Model Description
34
-
35
- <!-- Provide a longer summary of what this model is/does. -->
36
- Fine-tuned Gemma Model which was worked on using the intel developer cloud
37
-
38
- - **Developed by:** Manik Sethi, Britney Nguyen, Mario Miranda
39
- - **Model type:** Language model
40
- - **Language(s) (NLP):** eng
41
- - **License:** apache-2.0
42
- - **Parent Model:** gemma-2b
43
- - **Resources for more information:** [Intel Develpor Cloud](https://console.cloud.intel.com/training)
44
-
45
-
46
-
47
- # Uses
48
-
49
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
50
- Model is intended to be used by individuals who are struggling to understand the information in important documentations. More specifically, the demographic includes immigrants and visa holders who struggle with english. When they receive documentaiton from jobs, government agencies, or healthcare, our model should be able to answer any questions they have.
51
-
52
- ## Direct Use
53
-
54
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
55
- <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
56
-
57
- User uploads a pdf to the application, which is then parsed by our model. The user is then able to ask questions about content in the given documentation.
58
-
59
- ## Out-of-Scope Use
60
-
61
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
- <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
63
- Misuse of the model would entail relying on it to provide legal advice, which it is not intended to give.
64
-
65
-
66
-
67
- # Bias, Risks, and Limitations
68
-
69
- Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
70
-
71
- Current limitations are the quantity of languages available for the model to serve in.
72
-
73
 
74
- ## Recommendations
 
75
 
76
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
77
- To translate the advice into a target language, we suggest first taking the output from the LLM, and *then* translating it. Trying to get the model to do both simultaneously may result in flawed responses.
78
 
 
 
 
79
 
80
- # Training Details
81
-
82
- ## Training Data
83
-
84
- Model was trained using the [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) datbase. This dataset contains a diverse range of question-answer pairs spanning multiple categories, facilitating comprehensive training. By focusing specifically on the question-answer pairs, the model adapts to provide accurate and relevant responses to various inquiries.
85
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- ## Training Procedure
88
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
89
- ### Preprocessing
90
- The dataset underwent preprocessing steps to extract question-answer pairs relevant to the "Question answering" category. This involved filtering the dataset to ensure that the model is fine-tuned on pertinent data, enhancing its ability to provide accurate responses.
91
- ### Speeds, Sizes, Times
92
-
93
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
94
-
95
- Ran through 25 epocs.
96
-
97
- # Evaluation
98
-
99
- <!-- This section describes the evaluation protocols and provides the results. -->
100
-
101
- ## Testing Data, Factors & Metrics
102
-
103
- ### Testing Data
104
- We fed the following prompts into the model
105
- <!-- This should link to a Data Card if possible. -->
106
- "What are the main differences between a vegetarian and a vegan diet?",
107
- "What are some effective strategies for managing stress and anxiety?",
108
- "Can you explain the concept of blockchain technology in simple terms?",
109
- "What are the key factors that influence the price of crude oil in global markets?",
110
- "When did Virgin Australia start operating?"
111
-
112
-
113
- ## Results
114
 
115
  More information needed
116
 
117
- # Environmental Impact
118
-
119
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
120
-
121
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
122
-
123
- - **Hardware Type:** Intel Max 1550 GPU
124
- - **Hours used:** More information needed
125
- - **Cloud Provider:** Intel Developr cloud
126
- - **Compute Region:** More information needed
127
- - **Carbon Emitted:** More information needed
128
-
129
- # Technical Specifications [optional]
130
-
131
- ## Model Architecture and Objective
132
 
133
  More information needed
134
 
135
- ## Compute Infrastructure
136
-
137
- More information needed
138
-
139
- ### Hardware
140
-
141
- Trained model on Intel Max 1550 GPU
142
-
143
- ### Software
144
-
145
- Developed model using Intel Developer Cloud
146
-
147
- # Model Card Authors
148
-
149
- Manik Sethi, Britney Nguyen, Mario Miranda
150
-
151
- # Model Card Contact
152
-
153
- More information needed
154
-
155
- # How to Get Started with the Model
156
-
157
- Use the code below to get started with the model.
158
-
159
- <details>
160
- <summary> Click to expand </summary>
161
 
162
  More information needed
163
 
164
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  model-index:
12
  - name: gemma-2b-dolly-qa
13
  results: []
 
 
14
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # gemma-2b-dolly-qa
 
20
 
21
+ This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the generator dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 2.0226
24
 
25
+ ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  More information needed
28
 
29
+ ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  More information needed
32
 
33
+ ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  More information needed
36
 
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 1e-05
43
+ - train_batch_size: 2
44
+ - eval_batch_size: 8
45
+ - seed: 42
46
+ - gradient_accumulation_steps: 8
47
+ - total_train_batch_size: 16
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: linear
50
+ - lr_scheduler_warmup_ratio: 0.05
51
+ - training_steps: 1480
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss |
56
+ |:-------------:|:-------:|:----:|:---------------:|
57
+ | 2.9171 | 1.6393 | 100 | 2.5622 |
58
+ | 2.4316 | 3.2787 | 200 | 2.2899 |
59
+ | 2.2671 | 4.9180 | 300 | 2.1729 |
60
+ | 2.16 | 6.5574 | 400 | 2.1081 |
61
+ | 2.1232 | 8.1967 | 500 | 2.0763 |
62
+ | 2.0816 | 9.8361 | 600 | 2.0586 |
63
+ | 2.07 | 11.4754 | 700 | 2.0476 |
64
+ | 2.0527 | 13.1148 | 800 | 2.0396 |
65
+ | 2.0445 | 14.7541 | 900 | 2.0343 |
66
+ | 2.0375 | 16.3934 | 1000 | 2.0300 |
67
+ | 2.0299 | 18.0328 | 1100 | 2.0270 |
68
+ | 2.0231 | 19.6721 | 1200 | 2.0248 |
69
+ | 2.0165 | 21.3115 | 1300 | 2.0233 |
70
+ | 2.0221 | 22.9508 | 1400 | 2.0226 |
71
+
72
+
73
+ ### Framework versions
74
+
75
+ - PEFT 0.10.0
76
+ - Transformers 4.40.1
77
+ - Pytorch 2.1.0.post0+cxx11.abi
78
+ - Datasets 2.19.0
79
+ - Tokenizers 0.19.1