Enhancing Code Generation for Low-Resource Languages: No Silver Bullet
Paper β’ 2501.19085 β’ Published β’ 5
How to use SpaceArm/Qwen2.5-Coder-7B-ABAP with PEFT:
Task type is invalid.
A fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct specialized for SAP ABAP development.
Two scripts are provided depending on your hardware:
| Script | GPU VRAM | Method | Time Estimate |
|---|---|---|---|
train_abap.py |
24GB+ (A10G, A100, L4) | LoRA (bf16) | ~1-2 hours |
train_abap_qlora.py |
8GB (RTX 3060/4060) | QLoRA (4-bit NF4) | ~7-11 hours |
pip install torch transformers trl peft datasets accelerate bitsandbytes
huggingface-cli login
python train_abap_qlora.py
pip install torch transformers trl peft datasets accelerate bitsandbytes
huggingface-cli login
python train_abap.py
| Dataset | Examples | Type |
|---|---|---|
| smjain/abap | 248 | ABAP coding tasks (reports, SELECT, internal tables) |
| Kaballas/abap | 1,070 | ABAP concept Q&A (OOP, classes, visibility) |
| Arturs213/abap-code-sec-finetune | ~4,000+ | ABAP security vulnerability analysis |
| Total | ~5,300+ |
train_abap.py
| Parameter | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 64 |
| Batch size | 2 Γ 8 grad_accum = 16 effective |
| Learning rate | 2e-4 (cosine) |
| Max length | 2048 |
| Precision | bf16 |
| Epochs | 3 |
train_abap_qlora.py
| Parameter | Value |
|---|---|
| Quantization | 4-bit NF4 + double quant |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Batch size | 1 Γ 16 grad_accum = 16 effective |
| Learning rate | 2e-4 (cosine) |
| Max length | 1024 |
| Precision | bf16 compute on NF4 base |
| Optimizer | paged_adamw_8bit |
| Epochs | 3 |
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained("SpaceArm/Qwen2.5-Coder-7B-ABAP")
tokenizer = AutoTokenizer.from_pretrained("SpaceArm/Qwen2.5-Coder-7B-ABAP")
messages = [
{"role": "system", "content": "You are an expert SAP ABAP developer."},
{"role": "user", "content": "Write an ABAP class that reads data from table MARA and displays it in an ALV grid."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluate against timkoehne/LLM-ABAP-Code-Generation-Benchmark (HumanEval adapted for ABAP).
If you hit out-of-memory on 8GB VRAM:
max_length from 1024 β 512 in train_abap_qlora.pynvidia-smi)PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before runningABAP is a low-resource programming language β while included in large code corpora like The Stack v2, training data is scarce compared to Python/Java. This model uses approaches from: