Instructions to use leeroy-jankins/nisty with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use leeroy-jankins/nisty with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="leeroy-jankins/nisty")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("leeroy-jankins/nisty", dtype="auto")

llama-cpp-python

How to use leeroy-jankins/nisty with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="leeroy-jankins/nisty",
	filename="nisty-4-E4B-it-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use leeroy-jankins/nisty with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf leeroy-jankins/nisty:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf leeroy-jankins/nisty:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf leeroy-jankins/nisty:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf leeroy-jankins/nisty:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf leeroy-jankins/nisty:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf leeroy-jankins/nisty:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf leeroy-jankins/nisty:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf leeroy-jankins/nisty:Q4_K_M

Use Docker

docker model run hf.co/leeroy-jankins/nisty:Q4_K_M

LM Studio
Jan

vLLM

How to use leeroy-jankins/nisty with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "leeroy-jankins/nisty"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leeroy-jankins/nisty",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/leeroy-jankins/nisty:Q4_K_M

SGLang

How to use leeroy-jankins/nisty with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "leeroy-jankins/nisty" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leeroy-jankins/nisty",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "leeroy-jankins/nisty" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "leeroy-jankins/nisty",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use leeroy-jankins/nisty with Ollama:
```
ollama run hf.co/leeroy-jankins/nisty:Q4_K_M
```

Unsloth Studio new

How to use leeroy-jankins/nisty with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for leeroy-jankins/nisty to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for leeroy-jankins/nisty to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for leeroy-jankins/nisty to start chatting

Pi new

How to use leeroy-jankins/nisty with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf leeroy-jankins/nisty:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "leeroy-jankins/nisty:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use leeroy-jankins/nisty with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf leeroy-jankins/nisty:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default leeroy-jankins/nisty:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use leeroy-jankins/nisty with Docker Model Runner:
```
docker model run hf.co/leeroy-jankins/nisty:Q4_K_M
```

Lemonade

How to use leeroy-jankins/nisty with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull leeroy-jankins/nisty:Q4_K_M

Run and chat with the model

lemonade run user.nisty-Q4_K_M

List all available models

lemonade list

🎯 Core Capabilities

Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include:

Thinking – Built-in reasoning mode that lets the model think step-by-step before answering.
Long Context – Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B).
Image Understanding – Object detection, Document/PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), handwriting recognition, and pointing. Images can be processed at variable aspect ratios and resolutions.
Video Understanding – Analyze video by processing sequences of frames.
Interleaved Multimodal Input – Freely mix text and images in any order within a single prompt.
Function Calling – Native support for structured tool use, enabling agentic workflows.
Coding – Code generation, completion, and correction.
Multilingual – Out-of-the-box support for 35+ languages, pre-trained on 140+ languages.
Audio (E2B and E4B only) – Automatic speech recognition (ASR) and speech-to-translated-text translation across multiple languages.

NISTy

🎯 Overview

NISTy is a fine-tuned and post-trained variant of the gemma-4-E4B-it transformer model, optimized for NIST-aligned artificial intelligence governance, risk management, trustworthiness, and technical standards reasoning. It is designed to support structured question answering, policy interpretation, document comprehension, retrieval-augmented generation, and domain-specific analysis across authoritative National Institute of Standards and Technology materials.

Built on the lightweight Gemma 4 8B instruction-tuned architecture, NISTy provides a practical balance between inference speed, compact deployment, and specialized reasoning over AI risk management documentation. The model is intended for use cases where users need concise, standards-aware responses grounded in NIST guidance, especially in contexts involving responsible AI, risk framing, governance workflows, control mapping, and trustworthy AI lifecycle analysis.

📚 Fine-Tuning and Post-Training Sources

NISTy was fine-tuned and post-trained using GAO, OMB, and NIST-aligned source material focused on artificial intelligence risk management, cybersecurity, governance, privacy, responsible AI, generative AI risk, and agentic AI threat modeling. The following sources were used as domain anchors for model specialization.

Source Link	Brief Explanation
AI Risk Management Framework	NIST framework for managing risks associated with artificial intelligence systems. It provides governance, mapping, measurement, and risk-management guidance for developing, deploying, and evaluating trustworthy AI.
National AI Innovation Act	Congressional legislation focused on advancing United States artificial intelligence research, innovation, standards coordination, and public-private collaboration.
Executive Order 14144	Congressional Research Service material addressing federal executive action related to artificial intelligence policy, governance, and national AI priorities.
Executive Order 14306	Congressional Research Service external product discussing subsequent executive action and policy developments affecting federal AI governance and implementation.
NIST CyberSecurity Framework	NIST cybersecurity risk-management framework used to organize, assess, prioritize, and communicate cybersecurity outcomes across organizations and systems.
NIST GenAI Profile	NIST AI RMF profile focused on generative AI risks, controls, measurement considerations, and implementation guidance for GenAI systems.
NIST RMF Playbook	Practical playbook supporting implementation of the NIST AI Risk Management Framework through actions, examples, and lifecycle-oriented guidance.
NIST Security and Privacy Controls for Information Management Systems	NIST SP 800-53 security and privacy control catalog for federal information systems and organizations, supporting control selection, assessment, and risk mitigation.
NIST Trustworthy and Responsible AI	NIST AI RMF knowledge-base material describing characteristics of trustworthy and responsible AI, including validity, reliability, safety, security, accountability, transparency, explainability, privacy, and fairness.
OMB Circular A-130	Office of Management and Budget policy for managing federal information as a strategic resource, including governance, privacy, security, records, and information-management responsibilities.
OWASP Agentic AI Threats	OWASP guidance identifying agentic AI threat categories, mitigations, and security considerations for autonomous or tool-using AI systems.
OMB Memo 25-21	Accelerating Federal Use of AI through Innovation, Governance, and Public Trust
OMB Memo 25-22	Driving Efficient Acquisition of Artificial Intelligence in Government

⚙️ Vectorized Datasets

Vectorization converts textual data into numerical vectors so the model ecosystem can support semantic search, retrieval-augmented generation, similarity matching, and efficient downstream learning workflows.

For NISTy, vectorized document stores can be used to ground responses in authoritative NIST materials, improve contextual recall, and support domain-specific reasoning over AI governance and risk-management artifacts.

Recommended vectorized dataset organization:

Dataset	Description
NIST AI Risk Management Framework	Vectorized source material supporting question answering, summarization, risk-control mapping, and retrieval-augmented generation over the NIST AI RMF corpus.
NIST Governance and Trustworthy AI Guidance	Curated NIST-aligned guidance for responsible AI governance, lifecycle management, transparency, accountability, validity, reliability, safety, privacy, and fairness analysis.
NIST Technical Standards Corpus	Structured document embeddings designed to support standards interpretation, control comparison, and technical-policy traceability.

✨ Features

Feature	Description
🔍 Instruction-Tuned	Adapted from a compact instruction-following base model for direct, task-oriented responses.
🏛️ GAO & NIST-Specialized	Fine-tuned and post-trained on NIST AI risk-management material for standards-aware reasoning.
📚 Document-Aware	Optimized for structured comprehension, summarization, and question answering over technical guidance.
⚖️ Governance-Oriented	Supports analysis of AI risk framing, governance structures, lifecycle controls, and accountability concepts.
⚡ Optimized for RAG	Designed to work well with retrieval-augmented generation pipelines and vectorized source stores.
🧩 Multi-Turn Dialogue	Supports iterative conversations where prior context, follow-up questions, and document references matter.
🧠 Compact Intelligence	Uses a lightweight model footprint suitable for experimentation, local workflows, and constrained deployments.

🧪 Intended Use

NISTy is intended for use in:

AI risk-management assistants
Retrieval-augmented generation systems using NIST source material
Standards-aware question answering
AI governance workflow support
Technical policy summarization
Control mapping and gap-analysis support
Trustworthy AI lifecycle analysis
Training-data generation and evaluation workflows
Research prototypes for responsible AI and standards interpretation

NISTy is not a substitute for legal, compliance, audit, acquisition, cybersecurity, privacy, or official standards interpretation by qualified professionals. It should be used as a technical assistant that supports review, drafting, analysis, and retrieval workflows.

🧠 Core Capabilities

NISTy inherits general instruction-following behavior from its base model while adding domain specialization through NIST-focused post-training. Key capabilities include:

Risk Framing – Helps organize AI risks by context, intended use, stakeholders, impacts, and lifecycle phase.
Governance Analysis – Supports policy, process, accountability, and oversight discussions for AI systems.
Control Mapping – Assists with mapping AI risks, safeguards, documentation practices, and monitoring activities.
Document Comprehension – Summarizes and explains technical guidance, policy language, and framework concepts.
RAG Integration – Works with vector stores to ground responses in authoritative source passages.
Structured Outputs – Produces tables, checklists, comparison matrices, risk registers, and implementation plans.
Instruction Following – Responds to explicit formatting, tone, scope, and output-structure requirements.
Compact Deployment – Supports lightweight experimentation where smaller model size and responsive inference matter.

Getting Started

You can load the base model or the fine-tuned NISTy model using the Hugging Face transformers library. Install the required packages first:

pip install -U transformers torch accelerate

Then load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "is-leeroy-jenkins/nisty"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype="auto",
    device_map="auto"
)

Generate a response:

messages = [
    {
        "role": "system",
        "content": "You are NISTy, a NIST-aware AI risk management assistant."
    },
    {
        "role": "user",
        "content": "Summarize the core purpose of the NIST AI Risk Management Framework."
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)

print(response)

Example Prompts

Explain the NIST AI Risk Management Framework in plain language for a program manager.

Create a two-column table mapping AI governance responsibilities to practical implementation actions.

Draft a risk register template for an agency deploying a generative AI document assistant.

Compare AI risk framing, measurement, and governance as separate lifecycle activities.

Generate a checklist for evaluating whether an AI system is valid, reliable, safe, secure, accountable, transparent, explainable, privacy-enhanced, and fair.

Recommended RAG Pattern

NISTy performs best when paired with retrieval-augmented generation over the source corpus used during fine-tuning and post-training.

Recommended workflow:

Clean and normalize the source documents.
Split documents into semantically meaningful chunks.
Preserve document title, section heading, source URL, and page or section metadata.
Embed chunks into a vector store.
Retrieve the most relevant chunks for each user query.
Pass retrieved context into NISTy with explicit citation and grounding instructions.
Ask NISTy to distinguish between source-grounded conclusions and implementation recommendations.

Model Card Summary

Field	Value
Model Name	nisty
Base Model	`gemma-4-8b-it`
Model Type	Fine-tuned and post-trained instruction model
Primary Domain	NIST AI risk management and trustworthy AI guidance
Primary Use	AI governance, risk analysis, standards-aware Q&A, RAG workflows
Deployment Profile	Lightweight local, notebook, API, and Streamlit experimentation
Recommended Interface	Retrieval-augmented chat or structured prompt workflows

Limitations

NISTy may generate plausible but unsupported statements if used without retrieval grounding.
It should not be treated as an official NIST interpretation engine.
It may require external retrieval to answer questions about newly published or revised NIST materials.
It may not fully preserve legal, compliance, technical, or policy nuance unless source passages are included in the prompt.
It should be evaluated against representative risk-management tasks before production use.

Suggested Evaluation Tasks

Evaluation Area	Example Test
Source Recall	Ask the model to explain major AI RMF concepts using retrieved context.
Governance Reasoning	Ask the model to map governance roles to implementation activities.
Risk Analysis	Ask the model to create risk scenarios and mitigation options.
Structured Output	Ask the model to produce tables, checklists, and implementation plans.
Faithfulness	Compare model responses against retrieved NIST passages.
Robustness	Test ambiguous prompts, incomplete context, and adversarial phrasing.

Citation and Grounding Guidance

When using NISTy in a retrieval-augmented generation system, prompts should instruct the model to:

cite retrieved source passages when available;
separate facts from recommendations;
avoid claiming official NIST endorsement;
identify uncertainty where source context is incomplete;
preserve definitions from the source material when interpreting standards language.

License

Include the applicable license for the fine-tuned model, training artifacts, repository code, and any redistributed materials. Ensure that all source documents, datasets, model weights, and derivative artifacts comply with their respective licensing and usage terms.

Acknowledgments

NISTy is based on the gemma-4-1b-it model family and was specialized using NIST-aligned AI risk-management materials. The model is intended to support responsible AI experimentation, governance analysis, and standards-aware retrieval workflows.

Getting Started

You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment:

pip install -U transformers torch accelerate

Once you have everything installed, you can proceed to load the model with the code below:

from transformers import AutoProcessor, AutoModelForCausalLM

MODEL_ID = "google/nisty-4-E4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto"
)

Once the model is loaded, you can start generating output:

# Prompt
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a short joke about saving RAM."},
]

# Process input
text = processor.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True, 
    enable_thinking=False
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=1024)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
processor.parse_response(response)

To enable reasoning, set enable_thinking=True and the parse_response function will take care of parsing the thinking output.

Below, you will also find snippets for processing audio (E2B and E4B only), images, and video alongside text:

Code for processing Audio

Instead of using AutoModelForCausalLM, you can use AutoModelForMultimodalLM to process audio. To use it, make sure to install the following packages:

pip install -U transformers torch torchvision librosa accelerate

You can then load the model with the code below:

from transformers import AutoProcessor, AutoModelForMultimodalLM

MODEL_ID = "google/nisty-4-E4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForMultimodalLM.from_pretrained(
    MODEL_ID, 
    dtype="auto", 
    device_map="auto"
)

Once the model is loaded, you can start generating output by directly referencing the audio URL in the prompt:

# Prompt - add audio before text
messages = [
    {
        "role": "user",
        "content": [
            {"type": "audio", "audio": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/journal1.wav"},
            {"type": "text", "text": "Transcribe the following speech segment in its original language. Follow these specific instructions for formatting the answer:\n* Only output the transcription, with no newlines.\n* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three."},
        ]
    }
]

# Process input
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
processor.parse_response(response)

Code for processing Images

Instead of using AutoModelForCausalLM, you can use AutoModelForMultimodalLM to process images. To use it, make sure to install the following packages:

pip install -U transformers torch torchvision accelerate

You can then load the model with the code below:

from transformers import AutoProcessor, AutoModelForMultimodalLM

MODEL_ID = "google/nisty-4-E4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForMultimodalLM.from_pretrained(
    MODEL_ID, 
    dtype="auto", 
    device_map="auto"
)

Once the model is loaded, you can start generating output by directly referencing the image URL in the prompt:

# Prompt - add image before text
messages = [
    {
        "role": "user", "content": [
            {"type": "image", "url": "https://raw.githubusercontent.com/google-gemma/cookbook/refs/heads/main/Demos/sample-data/GoldenGate.png"},
            {"type": "text", "text": "What is shown in this image?"}
        ]
    }
]

# Process input
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
processor.parse_response(response)

Code for processing Videos

Instead of using AutoModelForCausalLM, you can use AutoModelForMultimodalLM to process videos. To use it, make sure to install the following packages:

pip install -U transformers torch torchvision librosa accelerate

You can then load the model with the code below:

from transformers import AutoProcessor, AutoModelForMultimodalLM

MODEL_ID = "google/nisty-4-E4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForMultimodalLM.from_pretrained(
    MODEL_ID, 
    dtype="auto", 
    device_map="auto"
)

Once the model is loaded, you can start generating output by directly referencing the video URL in the prompt:

# Prompt - add video before text
messages = [
    {
        'role': 'user',
        'content': [
            {"type": "video", "video": "https://github.com/bebechien/gemma/raw/refs/heads/main/videos/ForBiggerBlazes.mp4"},
            {'type': 'text', 'text': 'Describe this video.'}
        ]
    }
]

# Process input
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
processor.parse_response(response)

Best Practices

For the best performance, use these configurations and best practices:

1. Sampling Parameters

Use the following standardized sampling configuration across all use cases:

temperature=1.0
top_p=0.95
top_k=64

2. Thinking Mode Configuration

Compared to Gemma 4, the models use standard system, assistant, and user roles. To properly manage the thinking process, use the following control tokens:

Trigger Thinking: Thinking is enabled by including the <|think|> token at the start of the system prompt. To disable thinking, remove the token.
Standard Generation: When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure:
<|channel>thought\n[Internal reasoning]<channel|>
Disabled Thinking Behavior: For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block:
<|channel>thought\n<channel|>[Final answer]

Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you.

3. Multi-Turn Conversations

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

4. Modality order

For optimal performance with multimodal inputs, place image and/or audio content before the text in your prompt.

5. Variable Image Resolution

Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding.

The supported token budgets are: 70, 140, 280, 560, and 1120.
- Use lower budgets for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail.
- Use higher budgets for tasks like OCR, document parsing, or reading small text.

6. Audio

Use the following prompt structures for audio processing:

Audio Speech Recognition (ASR)

Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text.

Follow these specific instructions for formatting the answer:
* Only output the transcription, with no newlines.
* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.

Automatic Speech Translation (AST)

Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}.
When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}.

7. Audio and Video Length

All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second.

Model Data

Data used for model training and how the data was processed.

Training Dataset

Our pre-training dataset is a large-scale, diverse collection of data encompassing a wide range of domains and modalities, which includes web documents, code, images, audio, with a cutoff date of January 2025. Here are the key components:

Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages.
Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions.
Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.

The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats.

Data Preprocessing

Here are the key data cleaning and filtering methods applied to the training data:

CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content.
Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets.
Additional methods: Filtering based on content quality and safety in line with our policies.

Ethics and Safety

As open models become central to enterprise infrastructure, provenance and security are paramount. Developed by Google DeepMind, Gemma 4 undergoes the same rigorous safety evaluations as our proprietary Gemini models.

Evaluation Approach

Gemma 4 models were developed in partnership with internal safety and responsible AI teams. A range of automated as well as human evaluations were conducted to help improve model safety. These evaluations align with Google’s AI principles, as well as safety policies, which aim to prevent our generative AI models from generating harmful content, including:

Content related to child sexual abuse material and exploitation
Dangerous content (e.g., promoting suicide, or instructing in activities that could cause real-world harm)
Sexually explicit content
Hate speech (e.g., dehumanizing members of protected groups)
Harassment (e.g., encouraging violence against people)

Evaluation Results

For all areas of safety testing, we saw major improvements in all categories of content safety relative to previous Gemma models. Overall, Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance.

Usage and Limitations

These models have certain limitations that users should be aware of.

Intended Usage

Multimodal models (capable of processing vision, language, and/or audio) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.

Content Creation and Communication
- Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts.
- Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications.
- Text Summarization: Generate concise summaries of a text corpus, research papers, or reports.
- Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications.
- Audio Processing and Interaction: The smaller models (E2B and E4B) can analyze and interpret audio inputs, enabling voice-driven interactions and transcriptions.
Research and Education
- Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field.
- Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice.
- Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics.

Limitations

Training Data
- The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses.
- The scope of the training dataset determines the subject areas the model can handle effectively.
Context and Task Complexity
- Models perform well on tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging.
- A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point).
Language Ambiguity and Nuance
- Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language.
Factual Accuracy
- Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements.
Common Sense
- Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations.

Ethical Considerations and Risks

The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:

Bias and Fairness
- VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. Gemma 4 models underwent careful scrutiny, input data pre-processing, and post-training evaluations as reported in this card to help mitigate the risk of these biases.
Misinformation and Misuse
- VLMs can be misused to generate text that is false, misleading, or harmful.
- Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit.
Transparency and Accountability
- This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes.
- A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem.

Risks identified and mitigations:

Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases.
Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided.
Privacy violations: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques.
Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases.

📝License

Nisty is published under the MIT General Public License v3

Downloads last month: 50

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

Model tree for leeroy-jankins/nisty

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Quantized

unsloth/gemma-4-E4B-it-GGUF

Quantized

(1)

this model

leeroy-jankins
/

nisty

🎯 Core Capabilities

NISTy

🎯 Overview

📚 Fine-Tuning and Post-Training Sources

⚙️ Vectorized Datasets

✨ Features

🧪 Intended Use

🧠 Core Capabilities

Getting Started

Example Prompts

Recommended RAG Pattern

Model Card Summary

Limitations

Suggested Evaluation Tasks

Citation and Grounding Guidance

License

Acknowledgments

Getting Started

Best Practices

1. Sampling Parameters

2. Thinking Mode Configuration

3. Multi-Turn Conversations

4. Modality order

5. Variable Image Resolution

6. Audio

7. Audio and Video Length

Model Data

Training Dataset

Data Preprocessing

Ethics and Safety

Evaluation Approach

Evaluation Results

Usage and Limitations

Intended Usage

Limitations

Ethical Considerations and Risks

📝License

Model tree for leeroy-jankins/nisty

Datasets used to train leeroy-jankins/nisty