alecrosales1
commited on
Commit
•
0a300e4
1
Parent(s):
22ccb9e
Upload README.md
Browse files
README.md
CHANGED
@@ -1,265 +1,214 @@
|
|
1 |
-
---
|
2 |
library_name: transformers
|
3 |
tags:
|
4 |
-
- unsloth
|
5 |
-
- LLMs-Aviation
|
6 |
-
- AI-Regulatory-Compliance
|
7 |
-
- RAC-AI-Colombia
|
8 |
license: apache-2.0
|
9 |
datasets:
|
10 |
-
- somosnlp/ColombiaRAC_FullyCurated
|
11 |
language:
|
12 |
-
- es
|
13 |
widget:
|
14 |
-
- text: >
|
15 |
-
|
16 |
-
|
17 |
-
|
|
|
|
|
|
|
18 |
|
|
|
19 |
|
20 |
-
|
21 |
|
22 |
-
### Model Description
|
23 |
|
24 |
-
Este documento ofrece una visión detallada de `GemmaColRAC-AeroExpert`, la quinta iteración de nuestro modelo especializado en regulaciones aeronáuticas colombianas. Presenta un salto cualitativo con respecto a las versiones previas, exhibiendo mejoras en precisión y un uso de recursos de GPU más eficiente, reflejando nuestro compromiso con el desarrollo sostenible y de calidad de tecnologías de IA para la aviación.
|
25 |
|
26 |
-
|
27 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/0undo4kZc7OtfGI5nnAa8.png" alt="Imagen del Reglamento Aeronáutico Colombiano" style="width: 40%; max-height: 550px;">
|
28 |
-
</p>
|
29 |
|
30 |
-
|
31 |
-
- **Funded by:** Fundación Universitaria Los Libertadores, SomosNLP, HuggingFace
|
32 |
-
- **Model type:** Specialized Language Model for Colombian Aeronautical Regulations
|
33 |
-
- **Language(s):** Spanish (`es-CO`)
|
34 |
-
- **License:** apache-2.0 <!-- Elegid una licencia lo más permisiva posible teniendo en cuenta la licencia del model pre-entrenado y los datasets utilizados -->
|
35 |
-
- **Fine-tuned from model:** [More Information Needed] <!-- Enlace al modelo pre-entrenado que habéis utilizado como base -->
|
36 |
-
- **Dataset used:** [RAC Corpus: Base de Datos del Reglamento Aeronáutico Colombiano 🛫📚🇨🇴](https://huggingface.co/datasets/somosnlp/Reglamento_Aeronautico_Colombiano_2024/blob/01bf7eebef40aaba374ffd30697582ab10ec3503/README.md)
|
37 |
|
|
|
38 |
|
|
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
-
### Model Sources
|
42 |
|
43 |
-
|
44 |
-
|
|
|
|
|
|
|
45 |
|
46 |
## Uses
|
47 |
|
|
|
|
|
48 |
### Direct Use
|
49 |
-
Is designed to assist professionals and students in the aviation industry by providing enhanced access to the Colombian Aeronautical Regulations through advanced language processing capabilities.
|
50 |
|
51 |
-
|
52 |
|
53 |
-
|
54 |
|
55 |
-
|
56 |
|
57 |
-
|
58 |
|
59 |
-
|
60 |
|
61 |
-
|
62 |
|
63 |
-
|
64 |
-
from transformers import AutoModel, AutoTokenizer
|
65 |
|
66 |
-
|
67 |
-
model = AutoModel.from_pretrained("somosnlp/GemmaColRAC-AeroExpert")
|
68 |
|
69 |
-
|
70 |
-
encoded_input = tokenizer("Example query about aviation regulations", return_tensors='pt')
|
71 |
-
output = model(**encoded_input)
|
72 |
-
```
|
73 |
|
74 |
-
|
75 |
|
76 |
-
|
77 |
|
78 |
-
|
79 |
|
|
|
80 |
|
|
|
81 |
|
|
|
82 |
|
83 |
-
|
84 |
|
85 |
-
|
86 |
|
87 |
-
|
88 |
-
- **Tiempo Total de Entrenamiento:** 12607 segundos
|
89 |
-
- **Optimizador:** AdamW con Bitfitting y Neutrino Noise
|
90 |
-
- **Pasos Máximos:** 4904
|
91 |
-
- **Tamaño de Secuencia:** 2048
|
92 |
-
- **Tamaño de Lote por Dispositivo:** 2
|
93 |
-
- **Versión de Transformers:** 4.39.2
|
94 |
-
- **Framework de Optimización:** Unsloth 2024.4
|
95 |
-
- **Métodos de Cuantificación:** bf16 con gradient_accumulation_steps de 2
|
96 |
-
- **Función de Activación:** gelu_pytorch_tanh
|
97 |
|
98 |
-
|
99 |
|
|
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
-
- **Training Time:** ⏳ Increased to allow more extensive fine-tuning of the model, resulting in improved accuracy.
|
109 |
-
- **Batch Size:** 🔢 Increased the batch size per device from 1 to 2, allowing for more efficient optimization.
|
110 |
-
- **Optimizer Upgrade:** 🛠️ Introduction of advanced techniques such as Bitfitting and Neutrino Noise to enhance model convergence.
|
111 |
-
- **Maximum Steps:** 🚶♂️ Significantly increased the maximum steps from 1638 to 4904, suggesting a broader coverage of data and deeper learning.
|
112 |
|
113 |
-
|
114 |
|
115 |
|
116 |
#### Training Hyperparameters
|
117 |
|
118 |
-
- **Training regime:** bf16 mixed precision
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
- **Max Steps:** 14,688
|
125 |
-
- **Total Training Time:** Approx. 5 hours 21 minutes (based on epochs and iteration speed)
|
126 |
-
- **Max Sequence Length:** 2048
|
127 |
-
- **Weight Decay:** 0.001
|
128 |
-
- **Learning Rate Scheduler:** Cosine
|
129 |
-
- **Adam Betas:** Beta1 = 0.99, Beta2 = 0.995
|
130 |
-
- **Max Gradient Norm:** 0.4
|
131 |
-
-
|
132 |
-
#### Speeds, Sizes, Times
|
133 |
-
|
134 |
-
- **Training Duration:** Approx. 3 hours 30 minutes for full training
|
135 |
-
- **Training Throughput:** 0.76 iterations per second (it/s)
|
136 |
-
- **Total Steps:** 14,688 steps over 8 epochs
|
137 |
-
- **Checkpoint Size:** Final model size was not specified; typical sizes for models of this type are several gigabytes.
|
138 |
-
- **Total Number of Trainable Parameters:** 78,446,592
|
139 |
[More Information Needed]
|
140 |
|
|
|
141 |
|
142 |
-
|
143 |
|
144 |
-
|
145 |
|
146 |
-
|
147 |
-
- **Train Loss:** 0.393565042567292 (final reported loss)
|
148 |
-
- **Training Runtime:** 10,763.56 seconds (approximately 2.99 hours)
|
149 |
-
- **Samples per Second:** 4.556
|
150 |
-
- **Steps per Second:** 0.456
|
151 |
-
- **Total Training Epochs:** 2
|
152 |
-
- **Total Training Steps:** 4,904
|
153 |
-
- **Gradient Norm:** 3.515625
|
154 |
-
- **Final Learning Rate:** 0 (end of training)
|
155 |
-
- **Average Loss over Training:** 0.1934
|
156 |
|
157 |
-
|
158 |
|
|
|
159 |
|
160 |
-
|
161 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/zuEetm8ifT5e3QtHfBBVD.png" alt="Trainning Loss" style="width: 80%; max-height: 350px;">
|
162 |
-
</p>
|
163 |
|
|
|
164 |
|
165 |
-
|
166 |
-
This model was evaluated the performance in simplifying RAC's content based on feedback from aeronautical experts, thereby enhancing regulatory compliance and understanding.
|
167 |
|
168 |
-
|
169 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/5iPvAhaTMnqRDBn2g7XIK.png" alt="Evaluation for model by Aeronautical experts" style="width: 40%; max-height: 550px;">
|
170 |
-
</p>
|
171 |
|
|
|
172 |
|
173 |
-
|
174 |
|
|
|
175 |
|
176 |
-
|
177 |
|
178 |
-
|
179 |
|
180 |
-
### Energy Consumption and Carbon Emissions 📉
|
181 |
|
182 |
-
- **Power Consumption:** 0.25 kW (250 watts)
|
183 |
-
- **Runtime Hours:** 3.6 hours
|
184 |
-
- **Carbon Intensity:** 475 gCO2eq per kWh (Global average)
|
185 |
|
186 |
-
|
187 |
|
188 |
-
|
189 |
-
- **Total Hours Used:** ~3.6 hours
|
190 |
-
- **Total Carbon Emitted:** Approximately 356.25 grams of CO₂ equivalents
|
191 |
|
192 |
-
|
193 |
|
194 |
-
|
195 |
|
196 |
-
|
197 |
|
198 |
-
|
199 |
|
200 |
-
|
|
|
|
|
|
|
|
|
201 |
|
202 |
-
|
203 |
|
204 |
-
|
205 |
-
- `json`: For parsing JSON files and handling serialization 📄.
|
206 |
-
- `pandas`: A powerful data manipulation and analysis library providing data structures and operations for manipulating numerical tables and time series 📊.
|
207 |
-
- `torch`: PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing, developed by Facebook's AI Research lab (FAIR) 🔥.
|
208 |
-
- `datasets`: A lightweight and extensible library to easily share and access datasets and evaluation metrics for machine learning tasks 📚.
|
209 |
-
- `huggingface_hub`: Used for managing model repositories on Hugging Face and interacting with Hugging Face Hub APIs 🌐.
|
210 |
|
211 |
-
|
212 |
-
- `transformers`: Provides thousands of pre-trained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages. It's designed to be both user-friendly for machine learning researchers and efficient to use in production 🤖.
|
213 |
-
- `BitsAndBytesConfig`, `TrainingArguments`: Advanced configurations from the Transformers library for fine-tuning the performance and efficiency of training neural networks ⚙️.
|
214 |
-
- `pipeline`: A utility for creating easy-to-use pipelines for various NLP tasks 🧪.
|
215 |
-
- `AutoModelForCausalLM`, `AutoTokenizer`: Utilities for loading and initializing pre-trained language models and their tokenizers 📝.
|
216 |
-
- `logging`: For configuring the logging level and output formats to track model training and inference processes effectively 📌.
|
217 |
|
218 |
-
|
219 |
-
- `LoraConfig`, `PeftModel`: Extensions from the PEFT (Parameter Efficient Fine-Tuning) library, which include LoRA (Low-Rank Adaptation of large models), allowing efficient fine-tuning and adaptation of large pre-trained models with minimal computational overhead 🚀.
|
220 |
-
|
221 |
-
- **Transformers Reinforcement Learning (TRL):**
|
222 |
-
- `SFTTrainer`: A component from the TRL library for applying reinforcement learning techniques to transformer models, specifically for sequence-to-sequence tasks 🎮.
|
223 |
|
224 |
-
|
225 |
|
226 |
-
|
227 |
|
228 |
-
|
229 |
|
230 |
-
|
231 |
-
- **Openness:** The Apache 2.0 license allows users to use, modify, and distribute the software freely, which encourages innovation and widespread use.
|
232 |
-
- **Protection:** It provides an explicit grant of patent rights from contributors to users, protecting them from patent litigation.
|
233 |
-
- **Commercial friendly:** Apache 2.0 is business-friendly, allowing the commercial use of the software which is crucial for wider adoption in industry settings.
|
234 |
|
235 |
-
|
236 |
|
|
|
237 |
|
|
|
238 |
|
|
|
239 |
|
240 |
-
|
241 |
|
242 |
-
|
243 |
|
244 |
-
|
245 |
|
246 |
-
|
247 |
|
248 |
-
|
249 |
|
250 |
-
|
251 |
|
252 |
-
|
253 |
|
254 |
-
|
255 |
-
- [Nicolai Potes](https://huggingface.co/NickyNicky) - Data Scientist, specializes in AI-driven regulatory compliance solutions.
|
256 |
-
- [Santiago Pineda](https://huggingface.co/Sapinedamo) - Project Manager and Senior ML Engineer, with extensive experience in deploying scalable AI solutions.
|
257 |
-
- [Alec Mauricio](https://huggingface.co/alecrosales1) - AI Researcher, focused on developing innovative models for text analysis and interpretation.
|
258 |
-
- [Danny Stevens](https://huggingface.co/dannystevens) - Software Engineer, provides expertise in software development and integration for machine learning applications.
|
259 |
|
260 |
-
|
261 |
|
|
|
262 |
|
263 |
-
## Contact
|
264 |
|
265 |
-
|
|
|
|
|
1 |
library_name: transformers
|
2 |
tags:
|
3 |
+
- unsloth
|
4 |
+
- LLMs-Aviation
|
5 |
+
- AI-Regulatory-Compliance
|
6 |
+
- RAC-AI-Colombia
|
7 |
license: apache-2.0
|
8 |
datasets:
|
9 |
+
- somosnlp/ColombiaRAC_FullyCurated
|
10 |
language:
|
11 |
+
- es
|
12 |
widget:
|
13 |
+
- text: >
|
14 |
+
<bos><start_of_turn>system\n\nYou are a helpful AI assistant.\n\nResponde
|
15 |
+
en formato json.\n\nEres un agente experto en la normativa aeronautica
|
16 |
+
Colombiana.<end_of_turn>\n\n<start_of_turn>user\n\n¿Qué sucede con las
|
17 |
+
empresas de servicios aéreos comerciales que no hayan actualizado su
|
18 |
+
permiso de operación después del 31 de marzo de
|
19 |
+
2024?<end_of_turn>\n\n<start_of_turn>model
|
20 |
|
21 |
+
# Model Card for Model ID
|
22 |
|
23 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
24 |
|
|
|
25 |
|
|
|
26 |
|
27 |
+
## Model Details
|
|
|
|
|
28 |
|
29 |
+
### Model Description
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
+
<!-- Provide a longer summary of what this model is. -->
|
32 |
|
33 |
+
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
34 |
|
35 |
+
- **Developed by:** [More Information Needed]
|
36 |
+
- **Funded by [optional]:** [More Information Needed]
|
37 |
+
- **Shared by [optional]:** [More Information Needed]
|
38 |
+
- **Model type:** [More Information Needed]
|
39 |
+
- **Language(s) (NLP):** [More Information Needed]
|
40 |
+
- **License:** [More Information Needed]
|
41 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
42 |
|
43 |
+
### Model Sources [optional]
|
44 |
|
45 |
+
<!-- Provide the basic links for the model. -->
|
46 |
+
|
47 |
+
- **Repository:** [More Information Needed]
|
48 |
+
- **Paper [optional]:** [More Information Needed]
|
49 |
+
- **Demo [optional]:** [More Information Needed]
|
50 |
|
51 |
## Uses
|
52 |
|
53 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
54 |
+
|
55 |
### Direct Use
|
|
|
56 |
|
57 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
58 |
|
59 |
+
[More Information Needed]
|
60 |
|
61 |
+
### Downstream Use [optional]
|
62 |
|
63 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
64 |
|
65 |
+
[More Information Needed]
|
66 |
|
67 |
+
### Out-of-Scope Use
|
68 |
|
69 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
|
|
70 |
|
71 |
+
[More Information Needed]
|
|
|
72 |
|
73 |
+
## Bias, Risks, and Limitations
|
|
|
|
|
|
|
74 |
|
75 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
76 |
|
77 |
+
[More Information Needed]
|
78 |
|
79 |
+
### Recommendations
|
80 |
|
81 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
82 |
|
83 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
84 |
|
85 |
+
## How to Get Started with the Model
|
86 |
|
87 |
+
Use the code below to get started with the model.
|
88 |
|
89 |
+
[More Information Needed]
|
90 |
|
91 |
+
## Training Details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
|
93 |
+
### Training Data
|
94 |
|
95 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
96 |
|
97 |
+
[More Information Needed]
|
98 |
|
99 |
+
### Training Procedure
|
100 |
|
101 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
102 |
|
103 |
+
#### Preprocessing [optional]
|
|
|
|
|
|
|
|
|
104 |
|
105 |
+
[More Information Needed]
|
106 |
|
107 |
|
108 |
#### Training Hyperparameters
|
109 |
|
110 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
111 |
+
|
112 |
+
#### Speeds, Sizes, Times [optional]
|
113 |
+
|
114 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
115 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
[More Information Needed]
|
117 |
|
118 |
+
## Evaluation
|
119 |
|
120 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
121 |
|
122 |
+
### Testing Data, Factors & Metrics
|
123 |
|
124 |
+
#### Testing Data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
125 |
|
126 |
+
<!-- This should link to a Dataset Card if possible. -->
|
127 |
|
128 |
+
[More Information Needed]
|
129 |
|
130 |
+
#### Factors
|
|
|
|
|
131 |
|
132 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
133 |
|
134 |
+
[More Information Needed]
|
|
|
135 |
|
136 |
+
#### Metrics
|
|
|
|
|
137 |
|
138 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
139 |
|
140 |
+
[More Information Needed]
|
141 |
|
142 |
+
### Results
|
143 |
|
144 |
+
[More Information Needed]
|
145 |
|
146 |
+
#### Summary
|
147 |
|
|
|
148 |
|
|
|
|
|
|
|
149 |
|
150 |
+
## Model Examination [optional]
|
151 |
|
152 |
+
<!-- Relevant interpretability work for the model goes here -->
|
|
|
|
|
153 |
|
154 |
+
[More Information Needed]
|
155 |
|
156 |
+
## Environmental Impact
|
157 |
|
158 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
159 |
|
160 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
161 |
|
162 |
+
- **Hardware Type:** [More Information Needed]
|
163 |
+
- **Hours used:** [More Information Needed]
|
164 |
+
- **Cloud Provider:** [More Information Needed]
|
165 |
+
- **Compute Region:** [More Information Needed]
|
166 |
+
- **Carbon Emitted:** [More Information Needed]
|
167 |
|
168 |
+
## Technical Specifications [optional]
|
169 |
|
170 |
+
### Model Architecture and Objective
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
+
### Compute Infrastructure
|
|
|
|
|
|
|
|
|
175 |
|
176 |
+
[More Information Needed]
|
177 |
|
178 |
+
#### Hardware
|
179 |
|
180 |
+
[More Information Needed]
|
181 |
|
182 |
+
#### Software
|
|
|
|
|
|
|
183 |
|
184 |
+
[More Information Needed]
|
185 |
|
186 |
+
## Citation [optional]
|
187 |
|
188 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
189 |
|
190 |
+
**BibTeX:**
|
191 |
|
192 |
+
[More Information Needed]
|
193 |
|
194 |
+
**APA:**
|
195 |
|
196 |
+
[More Information Needed]
|
197 |
|
198 |
+
## Glossary [optional]
|
199 |
|
200 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
201 |
|
202 |
+
[More Information Needed]
|
203 |
|
204 |
+
## More Information [optional]
|
205 |
|
206 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
207 |
|
208 |
+
## Model Card Authors [optional]
|
209 |
|
210 |
+
[More Information Needed]
|
211 |
|
212 |
+
## Model Card Contact
|
213 |
|
214 |
+
[More Information Needed]
|