BAR-5x7B โ€” GGUF (first-of-its-kind FlexOlmo conversion)

This is the first GGUF conversion of allenai/BAR-5x7B, the largest member of AllenAI's BAR-family Mixture-of-Experts models released on 2026-04-19 based on the new FlexOlmo architecture.

5 experts ร— 7B โ†’ ~33B total parameters with top-k routing.

โš  Requires patched llama.cpp

The FlexOlmo architecture is not yet supported in upstream llama.cpp. To run this GGUF use the FlexOlmo support fork:

Build from the fork:

git clone https://github.com/Seraphiel102/llama.cpp.git
cd llama.cpp
git checkout flex-olmo-pr-clean
cmake -B build -DGGML_CUDA=OFF
cmake --build build -j --target llama-cli llama-quantize llama-completion

What FlexOlmo is

Per transformers.models.flex_olmo, FlexOlmoDecoderLayer is Olmo2's hybrid post-norm decoder layer with the dense FFN swapped for OlmoE-style top-k MoE routing. Specifically:

  • Attention with q_norm and k_norm (Olmo2-style)
  • post_attention_layernorm and post_feedforward_layernorm (post-norm pattern, no input_layernorm)
  • Top-k MoE FFN with softmax routing (OlmoE-style)
  • No sliding-window attention

Files

Quant Size Notes
BAR-5x7B.Q4_K_M.gguf 14 GB recommended, fits 16GB VRAM at small context
(more quants pending)

Usage

./build/bin/llama-completion \
  -m BAR-5x7B.Q4_K_M.gguf \
  -p "The 5 experts in BAR-5x7B are " \
  -n 100

Validation

The Q4_K_M conversion was validated against the patched llama.cpp build using a basic arithmetic prompt and produces correct, coherent output.

Credit

  • Model: AllenAI โ€” allenai/BAR-5x7B
  • FlexOlmo support in llama.cpp: PR by @Seraphiel102 / Nyx
  • Conversion: llama.cpp + the convert_hf_to_gguf.py patch from the support PR

If this saved you time, please โญ the llama.cpp PR.

Downloads last month
10
GGUF
Model size
25B params
Architecture
flex_olmo
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RhinoWithAcape/BAR-5x7B-GGUF

Base model

allenai/BAR-5x7B
Quantized
(1)
this model