Instructions to use google/translategemma-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/translategemma-4b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/translategemma-4b-it")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/translategemma-4b-it", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/translategemma-4b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/translategemma-4b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/translategemma-4b-it
- SGLang
How to use google/translategemma-4b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/translategemma-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/translategemma-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/translategemma-4b-it with Docker Model Runner:
docker model run hf.co/google/translategemma-4b-it
Handling Unknown Source Languages in Translation Templates
Hi All,
I am using this model for a translation task and observed some behavior related to the chat template and source_lang_code handling.
When the source language is known, I can specify it explicitly, and the model works correctly.
Example:
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"source_lang_code": "hi", # Source language (Hindi)
"target_lang_code": "en", # Target language (English)
"text": "नमस्ते, आप कैसे हैं?",
}
],
}
]
However, the issue occurs when the source language is unknown.
Case 1: Removing source_lang_code or using "auto"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"source_lang_code": "auto", # or remove this line
"target_lang_code": "en",
"text": "नमस्ते, आप कैसे हैं?",
}
],
}
]
In this case, I receive a dictionary/key-related error.
Case 2: Setting the source language to "en" by default
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"source_lang_code": "en",
"target_lang_code": "en",
"text": "नमस्ते, आप कैसे हैं?",
}
],
}
]
Interestingly, the model still generates a proper response even though the actual source language is Hindi.
My Observation
It seems that:
- the model may already be capable of automatically detecting the source language internally,
- but the template currently requires
source_lang_codeto be explicitly provided, - and
"auto"is not yet supported.
Suggestion
It would be very useful if the template supported:
"source_lang_code": "auto"
or handled missing source_lang_code gracefully by automatically detecting the language.
This would improve usability for real-world multilingual translation pipelines where the source language may not always be known beforehand.
Please let me know if this behavior is expected, a bug, or if support for "auto" is planned in future updates.
Thank you.
Hi @sandeep1401
Thank you for the detailed feedback. This is expected behavior at the chat template layer. The KeyError you see in Case 1 happens because the current template parser strictly requires a supported language code to format the underlying prompt. It is not a bug in the model itself.
Can you also brief about the case 2
To clarify what's happening in Case 2 the chat template validates that source_lang_code is a recognized ISO code. Because "en" is valid, it passes that check cleanly, which is why you don't see an error. As for why the translation still worked: the input text is unambiguously Hindi, and target_lang_code: "en" remains a clear directive. The model acts on what it can determine from the input and produces the correct output.
However, relying on this is not intended usage. The chat template is explicitly designed around correct, explicit language codes to ensure consistent and deterministic behavior. We recommend always passing the correct source_lang_code when known.
Hi @sandeep1401 , I wrote some code in my repo to add a new language. It is an ongoing work, but you can use it. Check https://github.com/MNIKIEMA/translategemma-finetune.
In my case, the loss is decreasing, but the quality of translation is not good. Any tips are welcome @sandeep1401 .
Thanks for the suggestion