Instructions to use axeltta/mistral-axel-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use axeltta/mistral-axel-1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/ministral-3-8b-instruct-2512-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "axeltta/mistral-axel-1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use axeltta/mistral-axel-1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for axeltta/mistral-axel-1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for axeltta/mistral-axel-1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for axeltta/mistral-axel-1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="axeltta/mistral-axel-1", max_seq_length=2048, )
mistral-axel-1
A LoRA adapter that teaches Ministral-3-8B-Instruct how to write COBOL. Lifts CobolEval pass@5 from 0.68% → 21.23% (+20.55pp) — a ~31× improvement over the base model on a 146-task HumanEval-style COBOL benchmark.
TL;DR
| Value | |
|---|---|
| Base model | unsloth/Ministral-3-8B-Instruct-2512 (BF16; trained on 4-bit BnB quant) |
| Adapter type | LoRA (PEFT) |
| Rank / α | 16 / 16 |
| Trainable params | ~28 M (≈ 0.35 % of base) |
| Adapter size | 107 MB (adapter_model.safetensors, 812 tensors, BF16) |
| Training data | 105 examples of COBOL string-processing (custom, see below) |
| Training time | 65 s on 1× H100 80GB (Modal) |
| Task | COBOL code completion in fixed-format (Area B / column 7) |
| License | Apache-2.0 (matches base model) |
What problem does it solve?
The base Ministral-3-8B-Instruct-2512 scores 0.68% pass@5 on CobolEval — out of 146 tasks, exactly one succeeds. The failure mode is almost universal: the model produces COBOL starting at column 1 (Area A) instead of column 7 (Area B), so GnuCOBOL's fixed-format parser rejects nearly every program before tests can even run.
Compile rate on CobolEval (sample 1 of 5): base ≈ 5.5% → axel-1 ≈ 63.7%
A 105-example LoRA on column convention + a few string-processing idioms is enough to fix this — without touching base weights and without harming the model on its original capabilities.
The benchmarks (one-line each)
- CobolEval — a 146-task HumanEval-style benchmark for COBOL code completion. Each task is a function signature in a COBOL skeleton; the model fills in the body. Compiled with GnuCOBOL, run against held-out test stdins.
- Component B — 24 hand-curated COBOL → Java translation tasks across 8 hard mainframe failure modes (
comp3_precision,OCCURS DEPENDING,REDEFINES, PIC formatting…). Frontier-only difficulty: Devstral-2 (123B) scores 8.3% pass@5. - Component C — 107 easier COBOL → Java tasks across 16 string-processing failure modes (case change, char check, palindrome, concat, accumulator, arithmetic…). Calibrated for small-model discrimination: Ministral-3B base scores 24.3%, Mistral-Medium 89.5%.
- Component D — 18 medium-difficulty COBOL → Java tasks across 8 skills (loop accumulate-filter, multi-format input, string search, boundary branches…). Sits between B and C in difficulty.
All four use the same Modal inference harness; only the LoRA flag changes between base / axel-1 rows.
Headline evaluation (pass@5, temperature 0.7, k=5)
| Benchmark | n | Base | mistral-axel-1 | Δ vs base |
|---|---|---|---|---|
| CobolEval (HumanEval-style COBOL) | 146 | 0.68% | 21.23% | +20.55pp |
| Component C (string-processing, 16 failure modes) | 107 | 24.30% | 24.30% | +0.00pp |
For reference, on the same harness:
- Mistral-Medium: 30.5% CobolEval
- Devstral-2 (123B): 31.6% CobolEval
- Claude Sonnet 4.6: 65.8% CobolEval
A 107 MB LoRA on an 8B model closes roughly two thirds of the gap between the base 8B model and Mistral-Medium on CobolEval.
Why Component C is flat
Component C is harder than CobolEval — it tests 16 specific COBOL failure modes (accumulators, arithmetic precision, OCCURS DEPENDING, REDEFINES…) where the base model already gets the column-7 convention right. Most of axel-1's CobolEval lift comes from teaching column formatting, which Component C doesn't need. Net effect on Component C: zero — neither a lift nor a regression. The training did not damage capabilities outside the SFT distribution.
Training data
105 examples of COBOL string-processing instruction-completion pairs (95 train / 10 val), built specifically to address CobolEval failure modes.
| sub-skill | count | source mix |
|---|---|---|
case_change (upper/lower transforms) |
26 | synthetic + 8 real CobolEval |
char_check (per-char predicates) |
27 | synthetic + 2 real |
palindrome (string reversal / detection) |
24 | synthetic + 1 real |
string_concat (build, join, append) |
23 | synthetic + 11 real |
| non-string holdouts (anti-forgetting) | 15 | from existing CobolEval coverage |
Composition by provenance: 68 synthetic (Claude Opus reasoning in-session, validated locally with cobc), 12 real CobolEval, 15 holdout, 10 hand-written edge cases. Every example compiles under GnuCOBOL fixed-format and has at least one passing reference test (9/9 verifier passes; 0 rejected).
No API spend on data generation. Synthesis was zero-cost reasoning; validation was a local
cobctoolchain.
Training procedure
# modal_app/finetune.py (single H100 80GB)
trainer = SFTTrainer(
model=model, # Ministral-3-8B-Instruct, 4-bit BnB quant
args=SFTConfig(
per_device_train_batch_size=4,
gradient_accumulation_steps=1,
max_steps=40,
learning_rate=1e-4,
lr_scheduler_type="cosine",
warmup_ratio=0.03,
bf16=True,
seed=12627998,
),
peft_config=LoraConfig(
r=16, lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0.0, bias="none",
),
)
- Framework: Unsloth + TRL
SFTTrainer, PEFT 0.19.1, Transformers 5.8 - Steps: 40 (~1.7 epochs over 95 train examples)
- Precision: BF16 weights, 4-bit BnB during training, BF16 adapter on disk
- Hardware: 1× NVIDIA H100 80GB on Modal
- Wall clock: 65 seconds
- Estimated cost: < $0.50 of compute
Usage
This is a PEFT/LoRA adapter — you load the base model first, then apply the adapter on top.
With PEFT + Transformers
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"mistralai/Ministral-3-8B-Instruct-2512",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "axeltta/mistral-axel-1")
tokenizer = AutoTokenizer.from_pretrained("axeltta/mistral-axel-1")
prompt = "Write a COBOL program that reads a string from stdin and prints it reversed."
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
return_tensors="pt", add_generation_prompt=True,
).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0], skip_special_tokens=True))
With Unsloth (faster, 4-bit)
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="axeltta/mistral-axel-1",
max_seq_length=2048,
load_in_4bit=True,
)
With vLLM
pip install vllm
vllm serve axeltta/mistral-axel-1
Intended use
- Primary: COBOL code completion / generation, especially fixed-format programs.
- Secondary: Research baseline for low-data domain SFT on legacy programming languages.
- Companion adapter:
axeltta/mistral-axel-2— same base, trained for COBOL→Java translation (mixed results, see its card).
Limitations and risks
- Narrow training distribution. 105 examples is very small; the lift comes mostly from teaching column convention, not deep COBOL semantics. The model can still generate plausible-looking but incorrect business logic.
- Not for production COBOL. This is research-grade. Always compile, run real tests, and have a human review the output before touching mainframe code.
- No safety tuning beyond the base model. Inherits all biases and risks of
Ministral-3-8B-Instruct-2512. - CobolEval lift is partly compilation, not full correctness. pass@5 of 21.23% is real, but compile rate (~64%) is higher than pass rate — many programs compile and run yet produce the wrong output.
Reproducibility
All artifacts in this repo are deterministic byproducts of:
- seed:
12627998 - 95 train / 10 val examples (sha256 in
funnex_manifest.json) - 40 steps with the hyperparameters above
A second training run from the same seed produced byte-identical adapter weights (verified during the v1 audit).
Upload was independently verified on Modal: a fresh snapshot_download of this repo loaded via PEFT against base BF16 and emitted valid Area-B COBOL on one generate call.
Citation
@software{mistral_axel_1_2026,
author = {Axelsson, A.},
title = {mistral-axel-1: A LoRA adapter for COBOL code completion on Ministral-3-8B-Instruct},
year = {2026},
url = {https://huggingface.co/axeltta/mistral-axel-1},
}
Framework versions
- PEFT 0.19.1
- Transformers 5.8
- TRL (SFTTrainer)
- Unsloth (training pipeline)
- PyTorch ≥ 2.7
- Downloads last month
- 33
Evaluation results
- pass@5 on CobolEval (HumanEval-style COBOL, 146 tasks)self-reported21.230
- pass@1 on CobolEval (HumanEval-style COBOL, 146 tasks)self-reported9.040
- pass@5 on Component C (107 tasks, 16 failure modes)self-reported24.300
- pass@1 on Component C (107 tasks, 16 failure modes)self-reported6.730

