thesven commited on
Commit
b6d82bc
1 Parent(s): 7746038

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - code
5
+ - trl
6
+ - qwen2
7
+ - aether code
8
+ - gguf
9
+ license: other
10
+ datasets:
11
+ - thesven/AetherCode-v1
12
+ language:
13
+ - en
14
+ ---
15
+
16
+
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324ce4d5d0cf5c62c6e3c5a/NlTeemUNYet9p5963Sfhr.png)
18
+
19
+ # Model Card for Aether-Qwen2-0.5B-SFT-v0.0.2
20
+
21
+ This repository contains GGUF quantizations of the Aether-Qwen2-0.5B-SFT-v0.0.2 model.
22
+
23
+ This model is an iteration of the Qwen2 model, fine-tuned using Supervised Fine-Tuning (SFT) on the AetherCode-v1 dataset specifically for code-related tasks. It combines the advanced capabilities of the base Qwen2 model with specialized training to enhance its performance in software development contexts.
24
+
25
+
26
+
27
+ ## Model Details
28
+
29
+ ### Model Description
30
+
31
+ Aether-Qwen2-0.5B-SFT-v0.0.1 is a transformer model from the Hugging Face 🤗 transformers library, designed to facilitate and improve automated coding tasks. This model has been enhanced via Supervised Fine-Tuning (SFT) to better understand and generate code, making it ideal for applications in software development, code review, and automated programming assistance.
32
+
33
+ - **Developed by:** Michael Svendsen
34
+ - **Finetuned from model:** Qwen2 0.5B
35
+
36
+
37
+ ## Uses
38
+
39
+
40
+ ### Direct Use
41
+
42
+ This model is ready for direct use in environments where coding assistance is needed, providing capabilities such as code completion, error detection, and suggestions for code optimization.
43
+
44
+ ### Downstream Use [optional]
45
+
46
+ Further fine-tuning on specific coding languages or frameworks can extend its utility to more specialized software development tasks.
47
+
48
+ ### Out-of-Scope Use
49
+
50
+ The model should not be used for general natural language processing tasks outside the scope of programming and code analysis.
51
+
52
+ ## Bias, Risks, and Limitations
53
+
54
+ Users should be cautious about relying solely on the model for critical software development tasks without human oversight, due to potential biases in training data or limitations in understanding complex code contexts.
55
+
56
+ ### Recommendations
57
+
58
+ Ongoing validation and testing on diverse coding datasets are recommended to ensure the model remains effective and unbiased.
59
+
60
+ ## How to Get Started with the Model
61
+
62
+ Use the code below to get started with the model.
63
+
64
+ ```python
65
+ from transformers import AutoModel
66
+
67
+ model = AutoModel.from_pretrained("thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")
68
+ ```
69
+
70
+ or with a pipeline:
71
+
72
+ ```python
73
+ from transformers import pipeline
74
+
75
+ messages = [
76
+ {"role": "system", "content": "You are a helpful software development assistant"},
77
+ {"role": "user", "content": "can you write a python function that adds 3 numbers together?"},
78
+ ]
79
+ pipe = pipeline("text-generation", model="thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")
80
+ print(pipe(messages))
81
+ ```
82
+
83
+ ### Prompt Template:
84
+ ```python
85
+ <|im_start|>system
86
+ {system}<|im_end|>
87
+ <|im_start|>user
88
+ {user}<|im_end|>
89
+ <|im_start|>assistant
90
+ {assistant}
91
+ ```
92
+
93
+ ## Training Details
94
+
95
+ ### Training Data
96
+
97
+ The model was trained using the 5star split from the AetherCode-v1 dataset, designed for enhancing coding-related AI capabilities.
98
+
99
+
100
+ ### Training Procedure
101
+
102
+ Training regime: The model was trained for 3 epochs on an RTX 4500 using Supervised Fine-Tuning (SFT)
103
+
104
+ #### Preprocessing [optional]
105
+
106
+ Standard preprocessing techniques were applied to prepare the code data for training.