aoxo commited on
Commit
93a0f13
·
verified ·
1 Parent(s): eedd7b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -66
README.md CHANGED
@@ -26,8 +26,8 @@ Memories - Token Compressor for Long-Range Dependency Conversations
26
  This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.
27
 
28
  - **Developed by:** Alosh Denny
29
- - **Funded by [optional]:** nil
30
- - **Shared by [optional]:** nil
31
  - **Model type:** Token Compressor for Memories
32
  - **Language(s) (NLP):** English
33
  - **License:** apache-2.0
@@ -38,7 +38,7 @@ This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically
38
 
39
  This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.
40
 
41
- ### Downstream Use [optional]
42
 
43
  The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.
44
 
@@ -61,7 +61,14 @@ This model should not be used for tasks that require full preservation of the or
61
 
62
  Use the code below to get started with the model.
63
 
64
- [More Information Needed]
 
 
 
 
 
 
 
65
 
66
  ## Training Details
67
 
@@ -76,14 +83,14 @@ The model was trained on a dataset compiled from various sources, including:
76
 
77
  ### Training Procedure
78
 
 
 
79
  #### Preprocessing
80
 
81
  Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.
82
 
83
  #### Training Hyperparameters
84
 
85
- #### Training Hyperparameters
86
-
87
  - **Training regime:** bf16 mixed precision
88
  - **Optimizer:** paged_adamw_8bit
89
  - **Learning rate:** 2e-4
@@ -91,7 +98,7 @@ Prompt-response pairs were processed from the datasets and compiled into a singl
91
  - **Batch size:** 4 per device
92
  - **Gradient accumulation steps:** 16
93
  - **Number of epochs:** 10
94
- - **Max steps:** 700,472
95
 
96
  #### LoRA Configuration
97
 
@@ -107,105 +114,93 @@ Prompt-response pairs were processed from the datasets and compiled into a singl
107
  - **Total Logged Training Time:** 1422.31 hours
108
  - **Start Time:** 07-21-2024 02:02:32
109
  - **End Time:** 09-18-2024 08:21:08
110
- - **Checkpoint Size (adapter):** 13,648,432 bytes
111
 
112
- ## Evaluation
113
 
114
- <!-- This section describes the evaluation protocols and provides the results. -->
115
 
116
- ### Testing Data, Factors & Metrics
 
 
 
117
 
118
- #### Testing Data
119
 
120
- <!-- This should link to a Dataset Card if possible. -->
121
 
122
- [More Information Needed]
123
 
124
- #### Factors
125
 
126
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
127
 
128
- [More Information Needed]
129
 
130
- #### Metrics
 
131
 
132
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
133
 
134
- [More Information Needed]
135
 
136
- ### Results
 
137
 
138
- [More Information Needed]
 
139
 
140
- #### Summary
141
 
 
 
142
 
 
 
143
 
144
- ## Model Examination [optional]
145
 
146
- <!-- Relevant interpretability work for the model goes here -->
 
147
 
148
- [More Information Needed]
 
149
 
150
  ## Environmental Impact
151
 
152
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
153
-
154
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
155
 
156
- - **Hardware Type:** [More Information Needed]
157
- - **Hours used:** [More Information Needed]
158
- - **Cloud Provider:** [More Information Needed]
159
- - **Compute Region:** [More Information Needed]
160
- - **Carbon Emitted:** [More Information Needed]
161
 
162
- ## Technical Specifications [optional]
163
 
164
  ### Model Architecture and Objective
165
 
166
- [More Information Needed]
167
 
168
  ### Compute Infrastructure
169
 
170
- [More Information Needed]
171
-
172
  #### Hardware
173
 
174
- [More Information Needed]
175
 
176
  #### Software
177
 
178
- [More Information Needed]
179
-
180
- ## Citation [optional]
181
-
182
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
183
-
184
- **BibTeX:**
185
-
186
- [More Information Needed]
187
-
188
- **APA:**
189
-
190
- [More Information Needed]
191
-
192
- ## Glossary [optional]
193
-
194
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
195
-
196
- [More Information Needed]
197
-
198
- ## More Information [optional]
199
-
200
- [More Information Needed]
201
-
202
- ## Model Card Authors [optional]
203
-
204
- [More Information Needed]
205
 
206
  ## Model Card Contact
207
 
208
- [More Information Needed]
 
209
  ### Framework versions
210
 
211
  - PEFT 0.12.0
 
26
  This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.
27
 
28
  - **Developed by:** Alosh Denny
29
+ - **Funded by:** EmelinLabs
30
+ - **Shared by** EmelinLabs
31
  - **Model type:** Token Compressor for Memories
32
  - **Language(s) (NLP):** English
33
  - **License:** apache-2.0
 
38
 
39
  This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.
40
 
41
+ ### Downstream Use
42
 
43
  The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.
44
 
 
61
 
62
  Use the code below to get started with the model.
63
 
64
+ '''
65
+ from peft import PeftModel, PeftConfig
66
+ from transformers import AutoModelForCausalLM
67
+
68
+ config = PeftConfig.from_pretrained("aoxo/llama-token-compressor")
69
+ base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-bnb-4bit")
70
+ model = PeftModel.from_pretrained(base_model, "aoxo/llama-token-compressor")
71
+ '''
72
 
73
  ## Training Details
74
 
 
83
 
84
  ### Training Procedure
85
 
86
+
87
+
88
  #### Preprocessing
89
 
90
  Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.
91
 
92
  #### Training Hyperparameters
93
 
 
 
94
  - **Training regime:** bf16 mixed precision
95
  - **Optimizer:** paged_adamw_8bit
96
  - **Learning rate:** 2e-4
 
98
  - **Batch size:** 4 per device
99
  - **Gradient accumulation steps:** 16
100
  - **Number of epochs:** 10
101
+ - **Max steps:** 175,118
102
 
103
  #### LoRA Configuration
104
 
 
114
  - **Total Logged Training Time:** 1422.31 hours
115
  - **Start Time:** 07-21-2024 02:02:32
116
  - **End Time:** 09-18-2024 08:21:08
117
+ - **Checkpoint Size (Adapter):** 13,648,432 bytes
118
 
119
+ ### Evaluation Data, Factors & Results
120
 
121
+ ## Evaluation
122
 
123
+ - **Total Evaluation Compute Throughput:** 14.34 GFLOPS
124
+ - **Total Logged Evaluation Time:** 34.25 minutes
125
+ - **Start Time:** 09-18-2024 08:23:11
126
+ - **End Time:** 09-18-2024 08:57:26
127
 
128
+ #### Evaluation Data
129
 
130
+ Evaluation was performed on a subset of the following dataset:
131
 
132
+ - sentence-transformers/sentence-compression
133
 
134
+ ### Results
135
 
136
+ To demonstrate the model's performance, we've tested it on prompts of varying lengths. The results show how the model compresses texts of different sizes while maintaining the core meaning.
137
 
138
+ #### Example 1: Very Large Paragraph
139
 
140
+ **Input:**
141
+ The impact of artificial intelligence on modern society is a topic of intense debate and speculation. As AI technologies continue to advance at an unprecedented pace, they are reshaping industries, transforming job markets, and altering the way we interact with machines and each other. Proponents argue that AI has the potential to solve some of humanity's most pressing challenges, from climate change to disease diagnosis. They envision a future where AI assistants enhance human productivity, autonomous vehicles reduce traffic accidents, and machine learning algorithms make breakthrough discoveries in science and medicine. However, critics warn of potential downsides, including job displacement, privacy concerns, and the ethical implications of delegating important decisions to machines. There are also fears about the long-term consequences of creating superintelligent AI systems that could potentially outstrip human control. As we navigate this complex landscape, it becomes increasingly important to develop robust governance frameworks and ethical guidelines to ensure that AI development aligns with human values and benefits society as a whole.
142
 
143
+ **Output:**
144
+ AI's rapid advancement is reshaping society, offering solutions to major challenges but raising concerns about job displacement, privacy, and ethics. Balancing AI's potential with its risks requires careful governance and ethical guidelines.
145
 
146
+ #### Example 2: Medium-Length Paragraph
147
 
148
+ **Input:**
149
+ The evolution of social media platforms has dramatically altered the landscape of human communication and information sharing. What began as simple networking sites have grown into complex ecosystems that influence everything from personal relationships to global politics. While social media has enabled unprecedented connectivity and democratized information access, it has also given rise to challenges such as misinformation spread, privacy breaches, and addictive design patterns. As these platforms continue to evolve, there is an ongoing debate about their role in society and the need for regulation to address their impact on mental health, democracy, and social cohesion.
150
 
151
+ **Output:**
152
+ Social media has transformed communication, offering connectivity but also causing issues like misinformation and privacy concerns. Its evolving role in society sparks debates on regulation and impact on mental health and democracy.
153
 
154
+ #### Example 3: Short Paragraph
155
 
156
+ **Input:**
157
+ Climate change is one of the most pressing issues of our time, with far-reaching consequences for ecosystems, economies, and human societies worldwide. Rising global temperatures are leading to more frequent extreme weather events, sea level rise, and shifts in wildlife populations and vegetation patterns. Addressing this challenge requires a coordinated global effort to reduce greenhouse gas emissions and transition to sustainable energy sources.
158
 
159
+ **Output:**
160
+ Climate change, a critical global issue, causes extreme weather, rising seas, and ecosystem shifts. Tackling it needs worldwide cooperation to cut emissions and adopt sustainable energy.
161
 
162
+ #### Example 4: Brief Statement
163
 
164
+ **Input:**
165
+ The rise of e-commerce has transformed the retail landscape, offering consumers unprecedented convenience and choice while posing challenges for traditional brick-and-mortar stores.
166
 
167
+ **Output:**
168
+ E-commerce growth offers consumer convenience, challenging traditional stores.
169
 
170
  ## Environmental Impact
171
 
 
 
172
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
173
 
174
+ - **Hardware Type:** RTX 4000 SFF 20GB Ada Generation Graphics Card
175
+ - **Hours used:** 1423
176
+ - **Cloud Provider:** Private Infrastructure
177
+ - **Compute Region:** Kochi, India (Asia Pacific)
178
+ - **Carbon Emitted:** 458.21 kg CO2
179
 
180
+ ## Technical Specifications
181
 
182
  ### Model Architecture and Objective
183
 
184
+ The model uses the Llama 3.1 8B architecture with 4-bit quantization. It was fine-tuned using LoRA for the task of token compression.
185
 
186
  ### Compute Infrastructure
187
 
 
 
188
  #### Hardware
189
 
190
+ RTX 4000 SFF 20GB Ada Generation Graphics Card
191
 
192
  #### Software
193
 
194
+ - Hugging Face Transformers
195
+ - PEFT (Parameter-Efficient Fine-Tuning)
196
+ - Accelerate
197
+ - bitsandbytes
198
+ - TRL (Transformer Reinforcement Learning)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
 
200
  ## Model Card Contact
201
 
202
+ aloshdeny@gmail.com
203
+
204
  ### Framework versions
205
 
206
  - PEFT 0.12.0