DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
Paper β’ 2511.06307 β’ Published β’ 53
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A complete training pipeline to fine-tune a code LLM for translating VB6 to C# using SFT + GRPO for maximum quality.
Qwen2.5-Coder-7B-Instruct
β
SFT (360 examples, 3 epochs)
β
simooo21/vb6-to-cs-qwen2.5-coder-7b-sft
β
GRPO (reward: syntax + format + length, 2 epochs)
β
simooo21/vb6-to-cs-qwen2.5-coder-7b-grpo
simooo21/vb6-to-csharp-translation[{"role": "user"}, {"role": "assistant"}])| Category | Examples | Key Mappings |
|---|---|---|
| variables | Dim β type declaration | Dim x As Integer β int x; |
| control_flow | If/Select Case β if/switch | Select Case β switch |
| loops | For/Do/For Each β for/while/foreach | For Each β foreach |
| functions | Sub/Function β methods | ByRef β ref, Optional β default params |
| arrays | 1-based β 0-based arrays | Dim arr(1 To 5) β int[] arr = new int[5] |
| strings | VB6 string funcs β C# methods | UCase/Trim/Len β ToUpper/Trim/Length |
| file_io | Open/Close β StreamReader/Writer | FreeFile β using statement |
| error_handling | On Error β try/catch | On Error GoTo β try { } catch |
| gui_events | Event subs β event handlers | cmd_Click() β cmd_Click(object, EventArgs) |
| gui_controls | VB6 controls β WinForms | ComboBox.AddItem β Items.Add |
| form_designer | .frm code β C# InitializeComponent | VB6 form layout β C# programmatic UI |
| database | ADO β SqlClient | ADODB.Connection β SqlConnection |
| api_calls | Declare β DllImport | Declare Function β [DllImport] |
| classes | VB6 class β C# class | Property Get/Let β C# properties |
| structs | Type β struct | Public Type β public struct |
| collections | Collection β Dictionary | Scripting.Dictionary β Dictionary<K,V> |
| regex | Like β Regex | Like "[A-Z]###" β Regex.IsMatch |
| enums | VB6 Enum β C# enum | Direct mapping with values |
| events | RaiseEvent β event invocation | RaiseEvent β ?.Invoke |
| interfaces | Implements β : interface | Implements INotifyPropertyChanged β : INotifyPropertyChanged |
| modules | Module-level β static class | Public Const β public const |
| dialogs | MsgBox/InputBox β MessageBox | vbModal β ShowDialog() |
| async | DoEvents β async/await | DoEvents β Application.DoEvents() / await |
| datetime | Date functions β DateTime | Now/DateAdd/DateDiff β DateTime.Now/AddDays |
| math | Rnd β Random | Rnd β Random.Next |
| formatting | Format β ToString | FormatCurrency β .ToString("C2") |
| conversions | CStr/CInt β Convert | CStr/CInt/CDate β Convert.ToString/Int32/DateTime |
| type_checks | IsNumeric β TryParse | IsNumeric β int.TryParse |
| operators | IIf β ternary | IIf(condition, a, b) β condition ? a : b |
| networking | Winsock β TcpClient | Winsock β TcpClient/NetworkStream |
| com_interop | CreateObject β Activator | CreateObject("Word.Application") β Activator.CreateInstance |
| printing | Printer β PrintDocument | Printer.Print β e.Graphics.DrawString |
| graphics | LoadPicture β Image.FromFile | Direct mapping |
| system | SendKeys β SendKeys.SendWait | Direct mapping |
| application | App.Path β Assembly | App.Path β Application.ExecutablePath |
| entry_point | Sub Main β static void Main | With [STAThread] attribute |
| null_checks | IsNull/Nothing β null/DBNull | Is Nothing β == null |
| access_modifiers | Friend β internal | Friend β internal |
Base model: Qwen/Qwen2.5-Coder-7B-Instruct
python train_sft.py
Hyperparameters (from OpenCodeInstruct):
| Parameter | Value | Rationale |
|---|---|---|
| lr | 5e-6 | Low LR for stable convergence on small dataset |
| epochs | 3 | Full coverage of 360 examples |
| batch | 1 Γ 8 accum | Effective batch 8 per GPU |
| seq_len | 1024 | All examples fit (max 570 tokens) |
| warmup | 50 steps | Smooth start |
| scheduler | cosine (default) | Proven for code tasks |
Base model: simooo21/vb6-to-cs-qwen2.5-coder-7b-sft
python train_grpo.py
Hyperparameters (from DRIVE / DeepSeekMath):
| Parameter | Value | Rationale |
|---|---|---|
| lr | 1e-6 | Very low for RL stability |
| epochs | 2 | Don't overfit on small data |
| num_generations | 8 | Group size per prompt |
| max_completion | 2048 | Long enough for code |
| rewards | syntax + format + length | Multi-objective optimization |
syntax_reward (0-1): Checks C# syntax patterns
format_reward (0-1): Checks for ```csharp code blocks
length_reward (0-1): Rewards reasonable code length (3-100 lines)
accelerate launchpython evaluate_model.py --model simooo21/vb6-to-cs-qwen2.5-coder-7b-grpo
Reports:
Per the literature review: