Instructions to use terra-cognita-ai/ResAI_Image-to-Text_final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use terra-cognita-ai/ResAI_Image-to-Text_final with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="terra-cognita-ai/ResAI_Image-to-Text_final")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_final")
model = AutoModelForMultimodalLM.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_final")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use terra-cognita-ai/ResAI_Image-to-Text_final with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "terra-cognita-ai/ResAI_Image-to-Text_final"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_final

SGLang

How to use terra-cognita-ai/ResAI_Image-to-Text_final with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "terra-cognita-ai/ResAI_Image-to-Text_final" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "terra-cognita-ai/ResAI_Image-to-Text_final" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "terra-cognita-ai/ResAI_Image-to-Text_final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use terra-cognita-ai/ResAI_Image-to-Text_final with Docker Model Runner:
```
docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_final
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gemma-4-E4B-it · pruned 20% → distilled → W4A16

A compute-and-energy-optimized google/gemma-4-E4B-it built by a three-stage pipeline:

Structural MLP prune (−20%) — the LM-stack gate_proj/up_proj/down_proj intermediate dimension is reduced 20% with a calibrated importance criterion. Vision/audio towers and attention are untouched.
Knowledge distillation — the pruned LM is recovered toward the unpruned bf16 teacher (Phase 1 forward-KL + hidden-state matching, Phase 2 on-policy GKD/JSD for brevity). Vision/audio towers frozen. This restores both capability and output brevity to approximate teacher level.
W4A16 quantization — int4 weight-only quantization via llmcompressor oneshot + QuantizationModifier (observer-only; no Hessian/AWQ). Activations stay bf16.

Saved in compressed-tensors pack-quantized format — loads in HF Transformers (Marlin / GPTQ-Marlin kernels, run_compressed=True) and in vLLM via the CompressedTensorsWNA16 loader.

The checkpoint's quant_recipe.json carries the full base → prune → distill → quant source_lineage together with the calibration datasets used in each step.

Quantization recipe

QuantizationModifier(
    config_groups={
        "group_0": {
            "targets": ["Linear"],
            "weights": {
                "num_bits": 4,
                "type": "int",
                "symmetric": True,
                "strategy": "group",
                "group_size": 128,
                "observer": "minmax",
                "actorder": None,
                "dynamic": False,
            },
            "input_activations": None,
            "output_activations": None,
        }
    },
    ignore=[
        "re:.*vision_tower.*",          # ViT encoder + patch embedder
        "re:.*audio_tower.*",           # audio layers + subsample + output_proj
        "re:.*per_layer_input_gate.*",  # PLE input gates
        "re:.*per_layer_projection.*",  # PLE projections
        "re:.*embed_vision.*",          # vision embedding_projection
        "re:.*embed_audio.*",           # audio embedding_projection
        "lm_head",
    ],
)

Only LM-stack Linear weights are packed to int4. The vision tower, audio tower, Per-Layer Embedding (PLE) plumbing, vision/audio projectors, and lm_head stay bf16.

Inference

No quantization= argument — vLLM auto-detects compressed-tensors from config.json and binds to MarlinLinearKernel.

To serve:

vllm serve terra-cognita-ai/ResAI_Image-to-Text_final --config vllm_config.yaml

The vllm_config.yaml is included in the root directory of the model.

License

Inherits the Gemma License from the base model. By using this checkpoint you agree to the Gemma Terms of Use.

Acknowledgements

vllm-project/llm-compressor — oneshot + QuantizationModifier.
neuralmagic/compressed-tensors — Marlin int4 kernels.
optipfair — the calibrated maw_hybrid pruning criterion.

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

I64

I32

BF16

Model tree for terra-cognita-ai/ResAI_Image-to-Text_final

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Quantized

(233)

this model