Instructions to use terra-cognita-ai/ResAI_Image-to-Text_final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use terra-cognita-ai/ResAI_Image-to-Text_final with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="terra-cognita-ai/ResAI_Image-to-Text_final") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_final") model = AutoModelForMultimodalLM.from_pretrained("terra-cognita-ai/ResAI_Image-to-Text_final") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use terra-cognita-ai/ResAI_Image-to-Text_final with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "terra-cognita-ai/ResAI_Image-to-Text_final" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terra-cognita-ai/ResAI_Image-to-Text_final", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_final
- SGLang
How to use terra-cognita-ai/ResAI_Image-to-Text_final with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "terra-cognita-ai/ResAI_Image-to-Text_final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terra-cognita-ai/ResAI_Image-to-Text_final", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "terra-cognita-ai/ResAI_Image-to-Text_final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "terra-cognita-ai/ResAI_Image-to-Text_final", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use terra-cognita-ai/ResAI_Image-to-Text_final with Docker Model Runner:
docker model run hf.co/terra-cognita-ai/ResAI_Image-to-Text_final
gemma-4-E4B-it · pruned 20% → distilled → W4A16
A compute-and-energy-optimized google/gemma-4-E4B-it
built by a three-stage pipeline:
- Structural MLP prune (−20%) — the LM-stack
gate_proj/up_proj/down_projintermediate dimension is reduced 20% with a calibrated importance criterion. Vision/audio towers and attention are untouched. - Knowledge distillation — the pruned LM is recovered toward the unpruned bf16 teacher (Phase 1 forward-KL + hidden-state matching, Phase 2 on-policy GKD/JSD for brevity). Vision/audio towers frozen. This restores both capability and output brevity to approximate teacher level.
- W4A16 quantization — int4 weight-only quantization via
llmcompressoroneshot+QuantizationModifier(observer-only; no Hessian/AWQ). Activations stay bf16.
Saved in compressed-tensors pack-quantized format — loads in HF Transformers
(Marlin / GPTQ-Marlin kernels, run_compressed=True) and in vLLM via the
CompressedTensorsWNA16 loader.
The checkpoint's quant_recipe.json carries the full
base → prune → distill → quant source_lineage together with the calibration datasets
used in each step.
Quantization recipe
QuantizationModifier(
config_groups={
"group_0": {
"targets": ["Linear"],
"weights": {
"num_bits": 4,
"type": "int",
"symmetric": True,
"strategy": "group",
"group_size": 128,
"observer": "minmax",
"actorder": None,
"dynamic": False,
},
"input_activations": None,
"output_activations": None,
}
},
ignore=[
"re:.*vision_tower.*", # ViT encoder + patch embedder
"re:.*audio_tower.*", # audio layers + subsample + output_proj
"re:.*per_layer_input_gate.*", # PLE input gates
"re:.*per_layer_projection.*", # PLE projections
"re:.*embed_vision.*", # vision embedding_projection
"re:.*embed_audio.*", # audio embedding_projection
"lm_head",
],
)
Only LM-stack Linear weights are packed to int4. The vision tower, audio tower,
Per-Layer Embedding (PLE) plumbing, vision/audio projectors, and lm_head stay
bf16.
Inference
No quantization= argument — vLLM auto-detects compressed-tensors from
config.json and binds to MarlinLinearKernel.
To serve:
vllm serve terra-cognita-ai/ResAI_Image-to-Text_final --config vllm_config.yaml
The vllm_config.yaml is included in the root directory of the model.
License
Inherits the Gemma License from the base model. By using this checkpoint you agree to the Gemma Terms of Use.
Acknowledgements
vllm-project/llm-compressor—oneshot+QuantizationModifier.neuralmagic/compressed-tensors— Marlin int4 kernels.optipfair— the calibratedmaw_hybridpruning criterion.
- Downloads last month
- -