Update README.md
Browse files
README.md
CHANGED
@@ -26,8 +26,8 @@ Memories - Token Compressor for Long-Range Dependency Conversations
|
|
26 |
This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.
|
27 |
|
28 |
- **Developed by:** Alosh Denny
|
29 |
-
- **Funded by
|
30 |
-
- **Shared by
|
31 |
- **Model type:** Token Compressor for Memories
|
32 |
- **Language(s) (NLP):** English
|
33 |
- **License:** apache-2.0
|
@@ -38,7 +38,7 @@ This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically
|
|
38 |
|
39 |
This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.
|
40 |
|
41 |
-
### Downstream Use
|
42 |
|
43 |
The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.
|
44 |
|
@@ -61,7 +61,14 @@ This model should not be used for tasks that require full preservation of the or
|
|
61 |
|
62 |
Use the code below to get started with the model.
|
63 |
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
## Training Details
|
67 |
|
@@ -76,14 +83,14 @@ The model was trained on a dataset compiled from various sources, including:
|
|
76 |
|
77 |
### Training Procedure
|
78 |
|
|
|
|
|
79 |
#### Preprocessing
|
80 |
|
81 |
Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.
|
82 |
|
83 |
#### Training Hyperparameters
|
84 |
|
85 |
-
#### Training Hyperparameters
|
86 |
-
|
87 |
- **Training regime:** bf16 mixed precision
|
88 |
- **Optimizer:** paged_adamw_8bit
|
89 |
- **Learning rate:** 2e-4
|
@@ -91,7 +98,7 @@ Prompt-response pairs were processed from the datasets and compiled into a singl
|
|
91 |
- **Batch size:** 4 per device
|
92 |
- **Gradient accumulation steps:** 16
|
93 |
- **Number of epochs:** 10
|
94 |
-
- **Max steps:**
|
95 |
|
96 |
#### LoRA Configuration
|
97 |
|
@@ -107,105 +114,93 @@ Prompt-response pairs were processed from the datasets and compiled into a singl
|
|
107 |
- **Total Logged Training Time:** 1422.31 hours
|
108 |
- **Start Time:** 07-21-2024 02:02:32
|
109 |
- **End Time:** 09-18-2024 08:21:08
|
110 |
-
- **Checkpoint Size (
|
111 |
|
112 |
-
|
113 |
|
114 |
-
|
115 |
|
116 |
-
|
|
|
|
|
|
|
117 |
|
118 |
-
####
|
119 |
|
120 |
-
|
121 |
|
122 |
-
|
123 |
|
124 |
-
|
125 |
|
126 |
-
|
127 |
|
128 |
-
|
129 |
|
130 |
-
|
|
|
131 |
|
132 |
-
|
|
|
133 |
|
134 |
-
|
135 |
|
136 |
-
|
|
|
137 |
|
138 |
-
|
|
|
139 |
|
140 |
-
####
|
141 |
|
|
|
|
|
142 |
|
|
|
|
|
143 |
|
144 |
-
|
145 |
|
146 |
-
|
|
|
147 |
|
148 |
-
|
|
|
149 |
|
150 |
## Environmental Impact
|
151 |
|
152 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
153 |
-
|
154 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
155 |
|
156 |
-
- **Hardware Type:**
|
157 |
-
- **Hours used:**
|
158 |
-
- **Cloud Provider:**
|
159 |
-
- **Compute Region:**
|
160 |
-
- **Carbon Emitted:**
|
161 |
|
162 |
-
## Technical Specifications
|
163 |
|
164 |
### Model Architecture and Objective
|
165 |
|
166 |
-
|
167 |
|
168 |
### Compute Infrastructure
|
169 |
|
170 |
-
[More Information Needed]
|
171 |
-
|
172 |
#### Hardware
|
173 |
|
174 |
-
|
175 |
|
176 |
#### Software
|
177 |
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
**BibTeX:**
|
185 |
-
|
186 |
-
[More Information Needed]
|
187 |
-
|
188 |
-
**APA:**
|
189 |
-
|
190 |
-
[More Information Needed]
|
191 |
-
|
192 |
-
## Glossary [optional]
|
193 |
-
|
194 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
195 |
-
|
196 |
-
[More Information Needed]
|
197 |
-
|
198 |
-
## More Information [optional]
|
199 |
-
|
200 |
-
[More Information Needed]
|
201 |
-
|
202 |
-
## Model Card Authors [optional]
|
203 |
-
|
204 |
-
[More Information Needed]
|
205 |
|
206 |
## Model Card Contact
|
207 |
|
208 |
-
|
|
|
209 |
### Framework versions
|
210 |
|
211 |
- PEFT 0.12.0
|
|
|
26 |
This model is a fine-tuned version of the Llama 3.1 8B 4-bit model, specifically trained for token compression tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning while maintaining the base model's performance.
|
27 |
|
28 |
- **Developed by:** Alosh Denny
|
29 |
+
- **Funded by:** EmelinLabs
|
30 |
+
- **Shared by** EmelinLabs
|
31 |
- **Model type:** Token Compressor for Memories
|
32 |
- **Language(s) (NLP):** English
|
33 |
- **License:** apache-2.0
|
|
|
38 |
|
39 |
This model is designed for token compression tasks. It can be used to generate more concise versions of input text while preserving the essential meaning.
|
40 |
|
41 |
+
### Downstream Use
|
42 |
|
43 |
The compressed outputs from this model can be used in various NLP applications where text length is a constraint, such as summarization, efficient text storage, or as input for other language models with token limits.
|
44 |
|
|
|
61 |
|
62 |
Use the code below to get started with the model.
|
63 |
|
64 |
+
'''
|
65 |
+
from peft import PeftModel, PeftConfig
|
66 |
+
from transformers import AutoModelForCausalLM
|
67 |
+
|
68 |
+
config = PeftConfig.from_pretrained("aoxo/llama-token-compressor")
|
69 |
+
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-bnb-4bit")
|
70 |
+
model = PeftModel.from_pretrained(base_model, "aoxo/llama-token-compressor")
|
71 |
+
'''
|
72 |
|
73 |
## Training Details
|
74 |
|
|
|
83 |
|
84 |
### Training Procedure
|
85 |
|
86 |
+
|
87 |
+
|
88 |
#### Preprocessing
|
89 |
|
90 |
Prompt-response pairs were processed from the datasets and compiled into a single dataset (available at https://huggingface.co/datasets/aoxo/token_compressor). Unwanted characters, trailing whitespaces and inverted commas were voided.
|
91 |
|
92 |
#### Training Hyperparameters
|
93 |
|
|
|
|
|
94 |
- **Training regime:** bf16 mixed precision
|
95 |
- **Optimizer:** paged_adamw_8bit
|
96 |
- **Learning rate:** 2e-4
|
|
|
98 |
- **Batch size:** 4 per device
|
99 |
- **Gradient accumulation steps:** 16
|
100 |
- **Number of epochs:** 10
|
101 |
+
- **Max steps:** 175,118
|
102 |
|
103 |
#### LoRA Configuration
|
104 |
|
|
|
114 |
- **Total Logged Training Time:** 1422.31 hours
|
115 |
- **Start Time:** 07-21-2024 02:02:32
|
116 |
- **End Time:** 09-18-2024 08:21:08
|
117 |
+
- **Checkpoint Size (Adapter):** 13,648,432 bytes
|
118 |
|
119 |
+
### Evaluation Data, Factors & Results
|
120 |
|
121 |
+
## Evaluation
|
122 |
|
123 |
+
- **Total Evaluation Compute Throughput:** 14.34 GFLOPS
|
124 |
+
- **Total Logged Evaluation Time:** 34.25 minutes
|
125 |
+
- **Start Time:** 09-18-2024 08:23:11
|
126 |
+
- **End Time:** 09-18-2024 08:57:26
|
127 |
|
128 |
+
#### Evaluation Data
|
129 |
|
130 |
+
Evaluation was performed on a subset of the following dataset:
|
131 |
|
132 |
+
- sentence-transformers/sentence-compression
|
133 |
|
134 |
+
### Results
|
135 |
|
136 |
+
To demonstrate the model's performance, we've tested it on prompts of varying lengths. The results show how the model compresses texts of different sizes while maintaining the core meaning.
|
137 |
|
138 |
+
#### Example 1: Very Large Paragraph
|
139 |
|
140 |
+
**Input:**
|
141 |
+
The impact of artificial intelligence on modern society is a topic of intense debate and speculation. As AI technologies continue to advance at an unprecedented pace, they are reshaping industries, transforming job markets, and altering the way we interact with machines and each other. Proponents argue that AI has the potential to solve some of humanity's most pressing challenges, from climate change to disease diagnosis. They envision a future where AI assistants enhance human productivity, autonomous vehicles reduce traffic accidents, and machine learning algorithms make breakthrough discoveries in science and medicine. However, critics warn of potential downsides, including job displacement, privacy concerns, and the ethical implications of delegating important decisions to machines. There are also fears about the long-term consequences of creating superintelligent AI systems that could potentially outstrip human control. As we navigate this complex landscape, it becomes increasingly important to develop robust governance frameworks and ethical guidelines to ensure that AI development aligns with human values and benefits society as a whole.
|
142 |
|
143 |
+
**Output:**
|
144 |
+
AI's rapid advancement is reshaping society, offering solutions to major challenges but raising concerns about job displacement, privacy, and ethics. Balancing AI's potential with its risks requires careful governance and ethical guidelines.
|
145 |
|
146 |
+
#### Example 2: Medium-Length Paragraph
|
147 |
|
148 |
+
**Input:**
|
149 |
+
The evolution of social media platforms has dramatically altered the landscape of human communication and information sharing. What began as simple networking sites have grown into complex ecosystems that influence everything from personal relationships to global politics. While social media has enabled unprecedented connectivity and democratized information access, it has also given rise to challenges such as misinformation spread, privacy breaches, and addictive design patterns. As these platforms continue to evolve, there is an ongoing debate about their role in society and the need for regulation to address their impact on mental health, democracy, and social cohesion.
|
150 |
|
151 |
+
**Output:**
|
152 |
+
Social media has transformed communication, offering connectivity but also causing issues like misinformation and privacy concerns. Its evolving role in society sparks debates on regulation and impact on mental health and democracy.
|
153 |
|
154 |
+
#### Example 3: Short Paragraph
|
155 |
|
156 |
+
**Input:**
|
157 |
+
Climate change is one of the most pressing issues of our time, with far-reaching consequences for ecosystems, economies, and human societies worldwide. Rising global temperatures are leading to more frequent extreme weather events, sea level rise, and shifts in wildlife populations and vegetation patterns. Addressing this challenge requires a coordinated global effort to reduce greenhouse gas emissions and transition to sustainable energy sources.
|
158 |
|
159 |
+
**Output:**
|
160 |
+
Climate change, a critical global issue, causes extreme weather, rising seas, and ecosystem shifts. Tackling it needs worldwide cooperation to cut emissions and adopt sustainable energy.
|
161 |
|
162 |
+
#### Example 4: Brief Statement
|
163 |
|
164 |
+
**Input:**
|
165 |
+
The rise of e-commerce has transformed the retail landscape, offering consumers unprecedented convenience and choice while posing challenges for traditional brick-and-mortar stores.
|
166 |
|
167 |
+
**Output:**
|
168 |
+
E-commerce growth offers consumer convenience, challenging traditional stores.
|
169 |
|
170 |
## Environmental Impact
|
171 |
|
|
|
|
|
172 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
173 |
|
174 |
+
- **Hardware Type:** RTX 4000 SFF 20GB Ada Generation Graphics Card
|
175 |
+
- **Hours used:** 1423
|
176 |
+
- **Cloud Provider:** Private Infrastructure
|
177 |
+
- **Compute Region:** Kochi, India (Asia Pacific)
|
178 |
+
- **Carbon Emitted:** 458.21 kg CO2
|
179 |
|
180 |
+
## Technical Specifications
|
181 |
|
182 |
### Model Architecture and Objective
|
183 |
|
184 |
+
The model uses the Llama 3.1 8B architecture with 4-bit quantization. It was fine-tuned using LoRA for the task of token compression.
|
185 |
|
186 |
### Compute Infrastructure
|
187 |
|
|
|
|
|
188 |
#### Hardware
|
189 |
|
190 |
+
RTX 4000 SFF 20GB Ada Generation Graphics Card
|
191 |
|
192 |
#### Software
|
193 |
|
194 |
+
- Hugging Face Transformers
|
195 |
+
- PEFT (Parameter-Efficient Fine-Tuning)
|
196 |
+
- Accelerate
|
197 |
+
- bitsandbytes
|
198 |
+
- TRL (Transformer Reinforcement Learning)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
199 |
|
200 |
## Model Card Contact
|
201 |
|
202 |
+
aloshdeny@gmail.com
|
203 |
+
|
204 |
### Framework versions
|
205 |
|
206 |
- PEFT 0.12.0
|