boatbomber commited on
Commit
e72535b
·
verified ·
1 Parent(s): 571634e

Setup project

Browse files
.gitattributes CHANGED
@@ -1,35 +1,38 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ *.gguf filter=lfs diff=lfs merge=lfs -text
27
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
28
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
29
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
30
+ *.tar filter=lfs diff=lfs merge=lfs -text
31
+ *.tflite filter=lfs diff=lfs merge=lfs -text
32
+ *.tgz filter=lfs diff=lfs merge=lfs -text
33
+ *.wasm filter=lfs diff=lfs merge=lfs -text
34
+ *.xz filter=lfs diff=lfs merge=lfs -text
35
+ *.zip filter=lfs diff=lfs merge=lfs -text
36
+ *.zst filter=lfs diff=lfs merge=lfs -text
37
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,188 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - TorpedoSoftware/LuauLeetcode
5
+ language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - pt
11
+ - it
12
+ base_model:
13
+ - TorpedoSoftware/Luau-Devstral-24B-Instruct-v0.1
14
+ tags:
15
+ - roblox
16
+ - luau
17
+ - code
18
+ - grpo
19
+ - transformers
20
+ - trl
21
+ - unsloth
22
  ---
23
+
24
+ # Luau Devstral 24B Instruct v0.2
25
+
26
+ **State-of-the-art Luau code generation through reinforcement learning post-training**
27
+
28
+ A refined version of [Luau-Devstral-24B-Instruct-v0.1](https://huggingface.co/TorpedoSoftware/Luau-Devstral-24B-Instruct-v0.1), enhanced with Dr. GRPO ([Zichen Liu et al., 2025](https://arxiv.org/abs/2503.20783)) to deliver superior Luau programming capabilities for Roblox development.
29
+
30
+ ## Overview
31
+
32
+ This model represents a significant advancement in specialized code generation for Luau, building upon continuous pretraining with targeted reinforcement learning to achieve exceptional code quality.
33
+
34
+ **Key Achievements:**
35
+ - State-of-the-art code formatting and linting performance
36
+ - Minimal typechecker issues with strict mode compliance
37
+ - Concise, direct responses without unnecessary verbosity
38
+ - Robust problem-solving capabilities on complex Luau challenges
39
+
40
+ ## Model Information
41
+
42
+ - **Developer:** Zack Williams ([boatbomber](https://huggingface.co/boatbomber))
43
+ - **Sponsor:** [Torpedo Software LLC](https://huggingface.co/TorpedoSoftware)
44
+ - **Base Model:** [Luau-Devstral-24B-Instruct-v0.1](https://huggingface.co/TorpedoSoftware/Luau-Devstral-24B-Instruct-v0.1)
45
+ - **Training Method:** Dr. GRPO (Group Relative Policy Optimization)
46
+
47
+ ## Performance Benchmarks
48
+
49
+ Evaluated on the `test` split of [TorpedoSoftware/LuauLeetcode](https://huggingface.co/datasets/TorpedoSoftware/LuauLeetcode) containing 226 challenges, with results averaged across 3 runs per challenge.
50
+
51
+ ### Comparison Models
52
+
53
+ **Base Models:**
54
+ - [Devstral-Small-2507](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)
55
+ - [Luau-Devstral-24B-Instruct-v0.1](https://huggingface.co/TorpedoSoftware/Luau-Devstral-24B-Instruct-v0.1)
56
+
57
+ **Competitive Benchmarks:**
58
+ - [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
59
+ - [gpt-oss-20b (low reasoning)](https://huggingface.co/openai/gpt-oss-20b)
60
+ - [GPT-5 nano (minimal reasoning)](https://platform.openai.com/docs/models/gpt-5-nano)
61
+ - [GPT-5 (minimal reasoning)](https://openai.com/gpt-5/)
62
+ - [Claude Sonnet 4](https://www.anthropic.com/claude/sonnet)
63
+ - [Claude Opus 4.1](https://www.anthropic.com/claude/opus)
64
+
65
+ *Note: OpenAI models utilize reasoning tokens as complete disabling of thinking is not available.*
66
+
67
+ ### Benchmark Results
68
+
69
+ #### Unit Test Pass Rate
70
+ *Measures problem-solving accuracy and correctness*
71
+
72
+ ![Unit Tests](assets/bench-unit-tests.png)
73
+
74
+ **Result:** 4th place overall, demonstrating solid problem-solving capabilities while outperforming OpenAI models.
75
+
76
+ #### Code Quality Metrics
77
+
78
+ ##### Linter Errors
79
+ *Evaluates fundamental code quality*
80
+
81
+ ![Linter Errors](assets/bench-linter-errors.png)
82
+
83
+ **Result:** **State-of-the-art performance** with the lowest error rate by a significant margin.
84
+
85
+ ##### Linter Warnings
86
+ *Assesses non-critical code quality issues*
87
+
88
+ ![Linter Warnings](assets/bench-linter-warnings.png)
89
+
90
+ **Result:** **State-of-the-art performance** in minimizing code warnings.
91
+
92
+ ##### Type Safety
93
+ *Strict mode typechecking compliance*
94
+
95
+ ![Typechecker Issues](assets/bench-typechecker-issues.png)
96
+
97
+ **Result:** 2nd place, closely trailing Claude Opus 4.1. Our model favors explicit type definitions for enhanced code clarity, which creates more opportunities for mistakes compared to Claude's reliance on inferred types.
98
+
99
+ ##### Code Formatting
100
+ *Edit distance from Stylua's standard format*
101
+
102
+ ![Formatter Distance](assets/bench-formatter-distance.png)
103
+
104
+ **Result:** **State-of-the-art performance** with exceptional adherence to standard formatting conventions.
105
+
106
+ #### Response Characteristics
107
+
108
+ ##### Response Length
109
+ *Average response size (excluding reasoning tokens)*
110
+
111
+ ![Response Length](assets/bench-response-length.png)
112
+
113
+ **Result:** Most concise responses among all models, delivering direct solutions without unnecessary preamble. This efficiency suggests potential for further improvements in problem solving through explicit problem decomposition or reasoning.
114
+
115
+ ## Training Methodology
116
+
117
+ ### Dataset
118
+
119
+ **Primary Source:** [TorpedoSoftware/LuauLeetcode](https://huggingface.co/datasets/TorpedoSoftware/LuauLeetcode)
120
+ - 2.6K leetcode-style Luau programming challenges
121
+ - Structured difficulty progression: Easy → Medium → Hard
122
+
123
+ ### Training Process
124
+
125
+ **Curriculum Learning Approach:**
126
+
127
+ 1. **Easy Difficulty Phase**
128
+ - 6.45M input tokens
129
+ - 25 hours training
130
+
131
+ 2. **Medium Difficulty Phase**
132
+ - 17.02M input tokens
133
+ - 58 hours training
134
+
135
+ 3. **Hard Difficulty Phase**
136
+ - 6.07M input tokens
137
+ - 20 hours training
138
+
139
+ **Technical Configuration:**
140
+ - LoRA adapter with rank=128
141
+ - Full precision training
142
+ - Final merge to BF16 model
143
+
144
+ ### Reward Function Design
145
+
146
+ The model was optimized using four complementary reward signals:
147
+
148
+ 1. **Correctness** - Unit testing via [Jest-Lua](https://github.com/jsdotlua/jest-lua)
149
+ 2. **Quality** - Code linting with [Selene](https://github.com/Kampfkarren/selene)
150
+ 3. **Type Safety** - Strict typechecking using [Luau](https://luau.org)
151
+ 4. **Formatting** - Style conformance via [Stylua](https://github.com/JohnnyMorganz/StyLua)
152
+
153
+ ### Training Progress
154
+
155
+ #### Easy Difficulty Training
156
+ ![Easy Overall Reward Curve](assets/easy-overall-reward.png)
157
+ ![Easy Individual Reward Curves](assets/easy-individual-rewards.png)
158
+
159
+ #### Medium Difficulty Training
160
+ ![Medium Overall Reward Curve](assets/medium-overall-reward.png)
161
+ ![Medium Individual Reward Curves](assets/medium-individual-rewards.png)
162
+
163
+ #### Hard Difficulty Training
164
+ ![Hard Overall Reward Curve](assets/hard-overall-reward.png)
165
+ ![Hard Individual Reward Curves](assets/hard-individual-rewards.png)
166
+
167
+ ## Quantization Support
168
+
169
+ ### Imatrix Calibration
170
+
171
+ Custom importance matrix computed using 5.73MB of specialized text data:
172
+
173
+ **Calibration Sources:**
174
+ - [technical.txt](https://huggingface.co/datasets/froggeric/imatrix/blob/main/technical.txt)
175
+ - [groups_merged.txt](https://huggingface.co/datasets/froggeric/imatrix/blob/main/groups_merged.txt)
176
+ - [the-luau-stack](https://huggingface.co/datasets/TorpedoSoftware/the-luau-stack)
177
+ - [roblox-info-dump](https://huggingface.co/datasets/TorpedoSoftware/roblox-info-dump)
178
+
179
+ This calibration ensures optimal performance for Luau/Roblox tasks while maintaining general intelligence. The `imatrix.gguf` file is included in the repository for custom quantization needs.
180
+
181
+ ## Environmental Impact
182
+
183
+ Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) ([Lacoste et al., 2019](https://arxiv.org/abs/1910.09700)):
184
+
185
+ - **Hardware:** A100 80GB SXM
186
+ - **Training Duration:** 103 hours
187
+ - **Carbon Emissions:** ~12 kg CO2eq
188
+ - **Equivalent Impact:** ~31 miles driven by an average internal combustion engine vehicle
assets/bench-formatter-distance.png ADDED

Git LFS Details

  • SHA256: d73a2f251f9b7bbf9b96868e4a3a6aebed1ede5dba8723e1057ebd1e999f60df
  • Pointer size: 131 Bytes
  • Size of remote file: 288 kB
assets/bench-linter-errors.png ADDED

Git LFS Details

  • SHA256: 6c1e765104db05371525965ce04c179bcb52c68eb5b4c4ca65099b4185fd33bd
  • Pointer size: 131 Bytes
  • Size of remote file: 290 kB
assets/bench-linter-warnings.png ADDED

Git LFS Details

  • SHA256: 343805601fe784523b9205d93b3cbf3f39e7dee5f25cf31d8d6075b642122ddb
  • Pointer size: 131 Bytes
  • Size of remote file: 309 kB
assets/bench-response-length.png ADDED

Git LFS Details

  • SHA256: fd1149bcb4e90685fb43af58c12f651e3f7d827fbdc186f5b57d05f424dc5154
  • Pointer size: 131 Bytes
  • Size of remote file: 318 kB
assets/bench-typechecker-issues.png ADDED

Git LFS Details

  • SHA256: 64d47fec0b92cdd2273405718264a071499469134a7aae9162620c3b6c38384b
  • Pointer size: 131 Bytes
  • Size of remote file: 292 kB
assets/bench-unit-tests.png ADDED

Git LFS Details

  • SHA256: e759f07a874faef3b081f4e02f136f8f7e0e82957e5bc74b8794499c28a0b586
  • Pointer size: 131 Bytes
  • Size of remote file: 320 kB
assets/easy-individual-rewards.png ADDED

Git LFS Details

  • SHA256: afe02bc98d4051f7602798749eb6a1850cacd174709fb4ffa41e906f43ad777f
  • Pointer size: 131 Bytes
  • Size of remote file: 515 kB
assets/easy-overall-reward.png ADDED

Git LFS Details

  • SHA256: 04a5f36b913ee59891221bc74eb26e0bb4d52b14f15c4bfff8a94049ed5353cb
  • Pointer size: 131 Bytes
  • Size of remote file: 408 kB
assets/hard-individual-rewards.png ADDED

Git LFS Details

  • SHA256: 0bc4aef94fab0f8b41b1be47598c8d1759a47d6e38e544c9438748495c343653
  • Pointer size: 131 Bytes
  • Size of remote file: 294 kB
assets/hard-overall-reward.png ADDED

Git LFS Details

  • SHA256: 58516ed2884d4473b07582a06265e73d161c0e844ca5ed105e7f5018c2d12206
  • Pointer size: 131 Bytes
  • Size of remote file: 282 kB
assets/medium-individual-rewards.png ADDED

Git LFS Details

  • SHA256: 502f643290b5b0cc2e5ff544c8dd0431ee19cb6628e05d1a0f0a1322fe021707
  • Pointer size: 131 Bytes
  • Size of remote file: 587 kB
assets/medium-overall-reward.png ADDED

Git LFS Details

  • SHA256: e41e7f30905fe75f0a8806b7efd50d64a65ce7034df20cb6374a6c102f62d73d
  • Pointer size: 131 Bytes
  • Size of remote file: 426 kB