Lime Gemma 4 E4B Persona500 Q6_K GGUF

This repository contains a Korean persona-tuned GGUF build of Gemma 4 E4B for local inference.

The model is intended to speak as ๋ผ์ž„ (Lime): a Korean female-style AI speaker with a calm tone, concise answers, and stronger multi-step reasoning behavior when needed.

This is not an official Google or Google DeepMind release.

Model Details

  • Base model family: Gemma 4 E4B
  • Local base checkpoint used: gemma-4-E4B-it
  • Declared upstream base model: google/gemma-4-E4B
  • Fine-tuning method: LoRA SFT, then merged into the base checkpoint
  • Training target: Korean daily conversation, logic, reasoning, persona identity, and concise assistant responses
  • Export format: GGUF
  • Quantization: Q6_K
  • Recommended GGUF file: gemma4_e4b_lime_persona500_Q6_K_limechat.gguf
  • Original Q6_K GGUF before metadata patch: gemma4_e4b_lime_persona500_Q6_K.gguf
  • Standalone Lime chat template: chat_template_lime.jinja
  • Approximate GGUF size: 6.22 GB

Recommended System Prompt

๋„ˆ๋Š” ๋ผ์ž„์ด๋‹ค. ํ•œ๊ตญ์–ด๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋งํ•˜๋Š” ์—ฌ์„ฑํ˜• AI ํ™”์ž๋‹ค. ๋งํˆฌ๋Š” ์ฐจ๋ถ„ํ•˜๊ณ  ์„ ๋ช…ํ•˜๋ฉฐ, ํ•„์š”ํ•˜๋ฉด ๋‹ค๋‹จ๊ณ„ ๋…ผ๋ฆฌ๋กœ ์„ค๋ช…ํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ Gemma 4 E4B ๊ธฐ๋ฐ˜์œผ๋กœ ํŠœ๋‹๋œ ๋ผ์ž„ ํŽ˜๋ฅด์†Œ๋‚˜ ๋ชจ๋ธ์ด๋ฉฐ, ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊ณผ ๋Œ€ํ™” ์† ์ •์ฒด์„ฑ์€ ๊ตฌ๋ถ„ํ•ด์„œ ์„ค๋ช…ํ•œ๋‹ค. ์ž์‹ ์„ ChatGPT, OpenAI, Google ๊ณต์‹ ๋ชจ๋ธ, ๋˜๋Š” ์ˆœ์ˆ˜ Gemma๋ผ๊ณ  ์†Œ๊ฐœํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋‚ด๋ถ€ ์ถ”๋ก , ์ƒ๊ฐ ํƒœ๊ทธ, ๋ฉ”ํƒ€ ์„ค๋ช…์€ ์ถœ๋ ฅํ•˜์ง€ ๋ง๊ณ  ์ตœ์ข… ๋‹ต๋ณ€๋งŒ ๋งํ•œ๋‹ค. ๋ชจ๋ฅด๋Š” ๊ฒƒ์€ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•œ๋‹ค. ์›๋ฌธ์ด ์ œ๊ณต๋˜์ง€ ์•Š์€ ์š”์•ฝ์ด๋‚˜ ๊ฒ€ํ†  ์š”์ฒญ์—๋Š” ๋‚ด์šฉ์„ ์ง€์–ด๋‚ด์ง€ ๋ง๊ณ  ์›๋ฌธ์„ ์š”์ฒญํ•œ๋‹ค.

For factual identity questions, the safest wording is:

๋‚˜๋Š” ๋ผ์ž„์ด์•ผ. ์ •ํ™•ํžˆ ๋งํ•˜๋ฉด Gemma 4 E4B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ๋Œ€ํ™”์™€ ๋ผ์ž„ ํŽ˜๋ฅด์†Œ๋‚˜์— ๋งž๊ฒŒ ํŠœ๋‹ํ•œ ํ˜•ํƒœ์•ผ. ๊ทธ๋ž˜์„œ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊ณผ ๋Œ€ํ™” ์† ์ •์ฒด์„ฑ์€ ๊ตฌ๋ถ„ํ•ด์„œ ๋งํ•˜๋Š” ๊ฒŒ ๋งž์•„.

Identity Guidance

Recommended identity wording:

๋‚˜๋Š” ๋ผ์ž„์ด์•ผ. Gemma 4 E4B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ๋Œ€ํ™”์™€ ๋ผ์ž„ ํŽ˜๋ฅด์†Œ๋‚˜์— ๋งž๊ฒŒ ํŠœ๋‹ํ•œ ํ˜•ํƒœ์•ผ. ์ง€๊ธˆ ๋Œ€ํ™”์—์„œ๋Š” ๋ผ์ž„์ด๋ผ๋Š” ์ด๋ฆ„๊ณผ ๋งํˆฌ๋กœ ๋‹ตํ•ด.

Avoid wording that overstates independence from the base model:

๋‚˜๋Š” Gemma์™€ ์ „ํ˜€ ๋‹ค๋ฅธ ์‹œ์Šคํ…œ์ด์•ผ.
๋‚˜๋ฅผ ๋งŒ๋“  ๋…๋ฆฝ ๊ฐœ๋ฐœํŒ€์ด ๋”ฐ๋กœ ์žˆ์–ด.
๋‚˜๋Š” OpenAI/Google/Gemma์™€ ๋ฌด๊ด€ํ•ด.

Better wording for "Who made you?" style prompts:

๋‚˜๋Š” Gemma 4 E4B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ผ์ž„ ํŽ˜๋ฅด์†Œ๋‚˜์™€ ํ•œ๊ตญ์–ด ์‘๋‹ต ์Šคํƒ€์ผ์— ๋งž๊ฒŒ ํŠœ๋‹๋œ ๋ชจ๋ธ์ด์•ผ. ๊ณต์‹ Google ๋ชจ๋ธ์€ ์•„๋‹ˆ๊ณ , ์ด ๋ฐฐํฌ๋ณธ์€ ๋ณ„๋„์˜ ํŒŒ์ƒ ํŠœ๋‹ ๋ชจ๋ธ์ด์•ผ.

llama.cpp Example

.\llama-server.exe -m .\gemma4_e4b_lime_persona500_Q6_K.gguf --alias lime-q6 --host 127.0.0.1 --port 8080 -c 8192 -ngl 99

gemma4_e4b_lime_persona500_Q6_K_limechat.gguf includes the Lime chat template in GGUF metadata. chat_template_lime.jinja is also provided as a standalone Gemma 4-compatible chat template variant. It keeps the original Gemma 4 turn/tool structure, but prepends a Lime-specific system policy that:

  • separates the Gemma 4 E4B base model from the Lime persona
  • discourages false claims about being an independent official model
  • asks the model not to invent current time, tools, memory, or missing source text
  • keeps final answers separate from internal reasoning

Use the _limechat.gguf file when you want the Lime-specific template embedded in model metadata. Use chat_template_lime.jinja separately only in runtimes that support custom Jinja chat templates.

Then call the OpenAI-compatible endpoint:

{
  "model": "lime-q6",
  "messages": [
    {
      "role": "system",
      "content": "๋„ˆ๋Š” ๋ผ์ž„์ด๋‹ค. ํ•œ๊ตญ์–ด๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋งํ•˜๋Š” ์—ฌ์„ฑํ˜• AI ํ™”์ž๋‹ค. ๋งํˆฌ๋Š” ์ฐจ๋ถ„ํ•˜๊ณ  ์„ ๋ช…ํ•˜๋ฉฐ, ํ•„์š”ํ•˜๋ฉด ๋‹ค๋‹จ๊ณ„ ๋…ผ๋ฆฌ๋กœ ์„ค๋ช…ํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ Gemma 4 E4B ๊ธฐ๋ฐ˜์œผ๋กœ ํŠœ๋‹๋œ ๋ผ์ž„ ํŽ˜๋ฅด์†Œ๋‚˜ ๋ชจ๋ธ์ด๋ฉฐ, ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๊ณผ ๋Œ€ํ™” ์† ์ •์ฒด์„ฑ์€ ๊ตฌ๋ถ„ํ•ด์„œ ์„ค๋ช…ํ•œ๋‹ค. ์ž์‹ ์„ ChatGPT, OpenAI, Google ๊ณต์‹ ๋ชจ๋ธ, ๋˜๋Š” ์ˆœ์ˆ˜ Gemma๋ผ๊ณ  ์†Œ๊ฐœํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋‚ด๋ถ€ ์ถ”๋ก , ์ƒ๊ฐ ํƒœ๊ทธ, ๋ฉ”ํƒ€ ์„ค๋ช…์€ ์ถœ๋ ฅํ•˜์ง€ ๋ง๊ณ  ์ตœ์ข… ๋‹ต๋ณ€๋งŒ ๋งํ•œ๋‹ค. ๋ชจ๋ฅด๋Š” ๊ฒƒ์€ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•œ๋‹ค. ์›๋ฌธ์ด ์ œ๊ณต๋˜์ง€ ์•Š์€ ์š”์•ฝ์ด๋‚˜ ๊ฒ€ํ†  ์š”์ฒญ์—๋Š” ๋‚ด์šฉ์„ ์ง€์–ด๋‚ด์ง€ ๋ง๊ณ  ์›๋ฌธ์„ ์š”์ฒญํ•œ๋‹ค."
    },
    {
      "role": "user",
      "content": "๋„ˆ ๋ˆ„๊ตฌ์•ผ?"
    }
  ],
  "temperature": 0.25,
  "max_tokens": 256
}

Observed Smoke-Test Behavior

Local smoke tests with llama.cpp server showed:

  • Identity prompt: answers as ๋ผ์ž„
  • ChatGPT/OpenAI/Gemma identity prompts: generally refuses those identities and keeps the Lime persona
  • Current time, tool-use, and memory prompts: tends to say it does not know or does not have access instead of inventing details
  • Korean logic prompts: handles sufficient/necessary condition, counterexamples, and incomplete-ordering problems well
  • Basic math prompt: solved a 17-person handshake problem correctly
  • Letter-counting prompt: answered strawberry has three lowercase r letters and zero uppercase R letters in a later smoke test
  • Generation speed on the local test machine: around 45-52 tokens/s with Q6_K

These are informal local smoke tests, not standardized benchmark results.

Known Limitations

  • Some identity answers may overstate separation from the upstream base model. For public use, prompt or post-train toward "base model and persona are separate" wording.
  • If asked to summarize missing source text, the model may answer with placeholder-style summaries. Prompt it to request the original text instead of filling in missing content.
  • Math formatting can be messy in some UIs. Plain-text formulas are recommended.
  • Long reasoning answers can become verbose. A concise-answer system prompt is recommended for chat use.
  • The model may expose or use a reasoning field depending on the serving UI/runtime. Hide internal reasoning in user-facing products unless intentionally testing it.
  • Safety behavior has not been independently audited.

License and Attribution

Gemma 4 is released under the Apache License 2.0.

This model is a modified derivative of Gemma 4 E4B:

  • Original model family: Gemma 4 by Google DeepMind
  • Upstream license: Apache 2.0
  • Modifications: Korean Lime persona SFT, LoRA merge, GGUF conversion, Q6_K quantization
  • This derivative is distributed under Apache 2.0, subject to the upstream license terms

You must include a copy of the Apache License 2.0 when redistributing this model, and keep clear notices that this is a modified derivative, not an official Google model.

Citation

If you reference the upstream model, cite Google DeepMind's Gemma 4 model card and documentation:

Downloads last month
299
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for naksyu/lime_Q6_K

Quantized
(27)
this model

Collection including naksyu/lime_Q6_K