GovSpeak / PreVillage Gemma E2B v4b

Best E2B release candidate for the GovSpeak / PreVillage service navigator, with a staged llama.cpp GGUF build for edge and kiosk testing.

This is still a retrieval-bound model. Do not treat it as a standalone source of government facts. Fees, contacts, office holders, URLs, required documents, and office-specific details should come from retrieval, structured source packs, deterministic extraction, officer interviews, WhatsApp/citizen reports, and human review.

Recommended E2B Artifact

Use the step600 checkpoint for edge demos:

step600/
gguf/gemma-helpdesk-v4b-step600-e2b-Q4_K_M.gguf

Why step600: it is the best balanced E2B checkpoint from the v4b evals. The best/ checkpoint had stronger refusal correctness but poor Roman-Nepali behavior. Step600 preserved Roman-Nepali behavior and avoided wrong refusals on the grounded gold set.

Step600 Eval

Eval path:

eval/reports/sft_v4b_step600_baseline300_full_eval/
Signal Result
Grounded items 73
chrF 22.81
URL recall 0.75
Wrong refusals 0/73 = 0.0%
Refusal correctness 83.5%
Belebele Nepali 58.0%
GSM8K-en 53.3%
Roman-Nepali degeneration 0/10

Known limitation: refusal correctness is below the 90% target. Put the model behind resolver and retrieval gates, and prefer deterministic refusal/follow-up logic where source coverage is missing.

GGUF Smoke

The Q4_K_M GGUF loaded in llama.cpp:

model: gguf/gemma-helpdesk-v4b-step600-e2b-Q4_K_M.gguf
prompt throughput: 449.3 t/s
generation throughput: 132.0 t/s

That smoke was on local hardware, not Raspberry Pi. Pi evidence should use the separate Pi runbook/benchmark numbers.

Usage

hf download voidash/gemma-helpdesk-v4b-e2b-seed42 \
  gguf/gemma-helpdesk-v4b-step600-e2b-Q4_K_M.gguf \
  --local-dir ./models

llama-cli \
  -m ./models/gguf/gemma-helpdesk-v4b-step600-e2b-Q4_K_M.gguf \
  --jinja \
  -sys "You are GovSpeak. Answer only from provided sources. Ask a compact follow-up if the service case is ambiguous." \
  -p "Question: mero nagarikta harayo, ke garne?"

Relation To E4B

E2B is the edge/local lane. The stronger planner/composer candidate is the E4B v6.4 adapter:

voidash/gemma-helpdesk-v6-4-e4b-g6e-qlora-seed42

Use E4B where a helpdesk PC or server can run the answer layer. Use E2B/GGUF where a low-cost local device is the deployment constraint.

Downloads last month
25
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for voidash/gemma-helpdesk-v4b-e2b-seed42

Adapter
(94)
this model

Collections including voidash/gemma-helpdesk-v4b-e2b-seed42