Instructions to use terra-cognita-ai/ResAI_Image-to-Text_Round-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use terra-cognita-ai/ResAI_Image-to-Text_Round-1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="terra-cognita-ai/ResAI_Image-to-Text_Round-1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_Round-1")
model = AutoModelForImageTextToText.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_Round-1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use terra-cognita-ai/ResAI_Image-to-Text_Round-1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "terra-cognita-ai/ResAI_Image-to-Text_Round-1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_Round-1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_Round-1

SGLang

How to use terra-cognita-ai/ResAI_Image-to-Text_Round-1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "terra-cognita-ai/ResAI_Image-to-Text_Round-1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_Round-1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "terra-cognita-ai/ResAI_Image-to-Text_Round-1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_Round-1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use terra-cognita-ai/ResAI_Image-to-Text_Round-1 with Docker Model Runner:
```
docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_Round-1
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gemma-4-E4B-it · W4A16 (llmcompressor, observer-only)

Int4 weight-only quantization of google/gemma-4-E4B-it, produced offline with llmcompressor oneshot + QuantizationModifier.

Only LM-stack Linear weights are packed to int4. Vision tower, audio tower, the Per-Layer Embedding (PLE) plumbing, vision/audio projectors, and lm_head are kept at bf16.

Saved in compressed-tensors pack-quantized format. Loads in vLLM via the CompressedTensorsWNA16 loader bound to MarlinLinearKernel.

Weights footprint at load: ≈9.5 GiB (vs ~16 GiB bf16 baseline, −41%).

Why observer-only

Gemma-4 E-variants combine Per-Layer Embeddings (per_layer_input_gate, per_layer_projection) with KV-sharing across decoder layers. Both GPTQModifier and AWQModifier rely on the llmcompressor sequential pipeline, which calls torch.fx.symbolic_trace on the model. The PLE + KV-sharing topology trips torch.fx.proxy.TraceError and the run aborts with no clean recovery.

QuantizationModifier skips the sequential pipeline entirely: it computes observer-only scales from calibration activations and quantizes weights in place. No Hessian, no AWQ smoothing — just statistics from the calibration forwards. This is a working PTQ path on Gemma-4 E.

What is quantized

259 Linear modules across the language stack:

42 × (q_proj, o_proj, gate_proj, up_proj, down_proj) = 210
24 × (k_proj, v_proj) (KV-sharing collapses 18/42 layers) = 48
1 × model.language_model.per_layer_model_projection = 1

Everything in the ignore list above stays bf16.

Calibration

Field	Value
Dataset	`garage-bAInd/Open-Platypus`
Split	`train`
Samples	256
Max seq length	2048 tokens
Chat template	applied (single-turn user message per row)
Modality	text only
Seed	42

Inference

vLLM

No quantization= argument needed — vLLM's compressed-tensors loader auto-detects from config.json and binds to MarlinLinearKernel:

from vllm import LLM
llm = LLM(model="<this repo>", dtype="bfloat16")

Reproduce

A quant_recipe.json is written alongside the safetensors with the git SHA, full scheme dict, ignore patterns, and calibration block — useful for reproducibility audits.

License

Inherits the Gemma License from the base model. By using this checkpoint you agree to the Gemma Terms of Use.

Acknowledgements

vllm-project/llm-compressor — the oneshot + QuantizationModifier machinery.
neuralmagic/compressed-tensors — the Marlin / GPTQ-Marlin int4 kernels used at inference.

Downloads last month: 62

Safetensors

Model size

8B params

Tensor type

I64

I32

BF16

Model tree for terra-cognita-ai/ResAI_Image-to-Text_Round-1

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Quantized

(201)

this model