LFM2.5-230M Fable-5 GGUF

Fine-tuned GGUF release of LiquidAI/LFM2.5-230M on Glint-Research/Fable-5-traces.

Files

  • lfm2.5-230m-fable-5-f16.gguf — highest quality, largest file
  • lfm2.5-230m-fable-5-q8_0.gguf — high quality, smaller
  • lfm2.5-230m-fable-5-q4_k_m.gguf — best default for local inference

Training

  • Base model: LiquidAI/LFM2.5-230M
  • Dataset: Glint-Research/Fable-5-traces
  • File used: fable5_cot_merged.jsonl
  • Method: PEFT LoRA SFT
  • Max sequence length: 4096
  • Epochs: 1
  • LoRA rank: 32
  • LoRA alpha: 64
  • LoRA dropout: 0.05
  • Precision: FP16 base model, FP32 LoRA trainable weights
  • Hardware: Google Colab T4
  • Format: Chat template system/user/assistant, preserving Fable context -> completion

Final training loss samples

  • step 555: 1.7037
  • step 560: 1.5968
  • step 565: 1.6435
  • step 570: 1.6109
  • step 575: 1.6589
  • step 580: 1.6439

Evaluation

We evaluated AKMESSI/lfm2.5-230m-fable-5:F16 against the original base model, LiquidAI/LFM2.5-230M-GGUF:BF16, using local llama.cpp server inference.

These are not official leaderboard submissions. They are lightweight local evaluations intended to compare the fine-tuned model against the base model under the same prompts, decoding settings, and hardware setup.

Summary

The Fable-5 fine-tune improves repository-context code continuation on RepoBench-C-lite Python, while mostly preserving the base model's generic function-calling behavior on BFCL-lite Simple.

Benchmark Result
RepoBench-C-lite Python Fine-tuned model outperforms base model
BFCL-lite Simple Fine-tuned model mostly preserves base function-calling ability
CodeXGLUE Line Completion Python Neutral / unchanged
CRUXEval-lite Not a good fit for this trace-style model

RepoBench-C-lite Python

RepoBench-C-style next-line code completion was used to evaluate repository-context code continuation. We sampled 100 examples each from python_if, python_cff, and python_cfr, for 300 total examples.

Model Examples Exact Match Prefix Match Edit Similarity
LiquidAI/LFM2.5-230M-GGUF:BF16 300 10.33% 10.67% 46.85%
AKMESSI/lfm2.5-230m-fable-5:F16 300 14.67% 15.33% 50.17%

Compared with the base model, the Fable-5 fine-tune improved:

  • Exact match by +4.33 percentage points
  • Prefix match by +4.67 percentage points
  • Edit similarity by +3.32 points

Breakdown by config:

Config Base Exact Fable Exact Base Edit Sim Fable Edit Sim
python_if 21.00% 27.00% 55.14% 57.31%
python_cff 3.00% 5.00% 37.45% 38.10%
python_cfr 7.00% 12.00% 47.96% 55.10%

BFCL-lite Simple

We also ran a local BFCL-lite Simple function-calling evaluation over 400 examples as a generic tool-calling control.

Model Examples Parse-valid JSON Function-name Match Argument Recall Rough Score
LiquidAI/LFM2.5-230M-GGUF:BF16 400 97.75% 97.50% 71.60% 88.44%
AKMESSI/lfm2.5-230m-fable-5:F16 400 98.25% 95.00% 67.70% 85.44%

The fine-tuned model preserves most of the base model's generic function-calling behavior, but does not improve BFCL-style API-schema-to-JSON calling. This is expected because the training data consists of coding-agent traces rather than clean function-calling examples.


CodeXGLUE Line Completion Python

We ran a 1,000-example local CodeXGLUE line-completion evaluation as a general code-completion control.

Model Examples Exact Match Prefix Match Edit Similarity
LiquidAI/LFM2.5-230M-GGUF:BF16 1000 23.60% 0.00% 23.60%
AKMESSI/lfm2.5-230m-fable-5:F16 1000 23.50% 0.00% 23.50%

This result is effectively neutral. The Fable-5 fine-tune does not materially change general line-completion performance on this setup.


CRUXEval-lite

We also tried a 200-example CRUXEval-lite run for Python execution reasoning.

Model Task O Accuracy Task I Accuracy Overall Accuracy
LiquidAI/LFM2.5-230M-GGUF:BF16 8.50% 4.00% 6.25%
AKMESSI/lfm2.5-230m-fable-5:F16 0.00% 0.00% 0.00%

This benchmark was not a good fit for the fine-tuned model. The Fable-5 model often entered explanation or trace-style response mode instead of returning only the exact literal Python value expected by CRUXEval.


Interpretation

The Fable-5 fine-tune appears to shift the base model toward coding-agent and repository-context continuation behavior.

It improves RepoBench-C-lite Python next-line completion, while mostly preserving generic function-calling ability on BFCL-lite Simple. The main regression is in exact BFCL-style argument filling, which is not the main target of the Fable-5 trace dataset.

The model is best understood as a tiny coding-agent trace model, not a general-purpose reasoning model or a benchmark-specialized function-calling model.


Evaluation Caveats

  • These are local lightweight evaluations, not official leaderboard submissions.
  • Results were produced with llama.cpp server inference.
  • Scores may vary with prompting, decoding settings, quantization level, and benchmark harness details.
  • BFCL-lite and RepoBench-C-lite use simplified local scoring scripts rather than official leaderboard infrastructure.
  • Only the F16 model was benchmarked here; quantized GGUF variants may differ slightly.

Usage

Recommended local file:

lfm2.5-230m-fable-5-q4_k_m.gguf

Caveats

This model is trained on coding-agent trace telemetry. It may emit tool-call-like actions, shell commands, file paths, or long reasoning-style continuations. Review outputs before executing commands.

The dataset contains coding-agent traces and should not be treated as a clean benchmark or a safety-filtered assistant dataset.

License notes

  • Base model: LiquidAI LFM Open License v1.0
  • Dataset: AGPL-3.0
  • This repo preserves upstream license notices. Check compatibility before commercial or closed-source use.
Downloads last month
238
GGUF
Model size
0.2B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AKMESSI/lfm2.5-230m-fable-5

Adapter
(1)
this model

Dataset used to train AKMESSI/lfm2.5-230m-fable-5