Instructions to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="makiisthebes/gemma-3-270M-Instruct-FoodExtract") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("makiisthebes/gemma-3-270M-Instruct-FoodExtract") model = AutoModelForCausalLM.from_pretrained("makiisthebes/gemma-3-270M-Instruct-FoodExtract") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "makiisthebes/gemma-3-270M-Instruct-FoodExtract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/makiisthebes/gemma-3-270M-Instruct-FoodExtract
- SGLang
How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "makiisthebes/gemma-3-270M-Instruct-FoodExtract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "makiisthebes/gemma-3-270M-Instruct-FoodExtract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with Docker Model Runner:
docker model run hf.co/makiisthebes/gemma-3-270M-Instruct-FoodExtract
Model Card for Gemma-3-270m-it Food Extraction
A small, task-specific language model fine-tuned to extract food and drink items from arbitrary text (such as image captions) and return them in a compact TOON structured format.
Model Details
Model Description
This model is a Supervised Fine-Tuned (SFT) version of google/gemma-3-270m-it,
trained to perform a single, narrow task: given a piece of free-form text,
classify whether it mentions food/drink and, if so, extract the specific items
as structured data. The motivation was to produce a cheap, locally-runnable
model that can filter or label large volumes of image captions without paying
for a frontier API or sending data off-device.
- Developed by: Michael (personal learning project)
- Model type: Decoder-only causal language model (Gemma 3 architecture)
- Language(s) (NLP): English
- License: Gemma Terms of Use
- Finetuned from model:
google/gemma-3-270m-it
Model Sources
- Demo: Gradio app — see "How to Get Started" below
- Tutorials this work follows:
Uses
Direct Use
The model takes a free-form English sentence and returns a TOON-encoded object describing whether the text mentions food or drink, and what specifically is mentioned. Typical inputs are image captions or short descriptive snippets, e.g.:
"For breakfast I had eggs, bacon and toast and a glass of orange juice."
Downstream Use
Suitable as a lightweight filter or labeller in a larger pipeline — for example, filtering a large dataset of image captions down to only those that reference food, before passing the filtered set to a more expensive downstream system (e.g. a recommendation engine or food-tracking app).
Out-of-Scope Use
- General-purpose chat or open-ended question answering. The model has been specialised for one extraction task and will perform worse than the base Gemma instruction model on broad conversational tasks.
- Languages other than English. Training data is English-only.
- Safety-critical decisions (e.g. medical or allergen advice). Output should not be relied on for nutritional, dietary, or allergen information.
- Inputs much longer than the training sequence length (512 tokens).
Bias, Risks, and Limitations
- Dataset specificity. Training data (
mrdbourke/FoodExtract-1k, ~1,420 rows) skews toward Western foods and image-caption-style phrasing. The model will likely under-perform on cuisines, dishes, or phrasings that are underrepresented in that distribution. - Mild overfitting. Training and validation loss curves diverge in the later epochs. For this narrow extraction task that is acceptable — and arguably desirable, since we want consistent structured output — but it does mean the model is unlikely to generalise gracefully to tasks outside food/drink extraction.
- Label noise inheritance. Targets were generated by
gpt-oss-120b, so any systematic errors in the teacher labels will be inherited by this model. - Format brittleness. Outputs use TOON formatting. Downstream consumers must parse TOON; malformed outputs are possible and should be guarded against.
Recommendations
Validate outputs with a parser before downstream use, and treat the model as a filter/heuristic rather than a source of truth for nutritional or allergen information.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
MODEL = "makiisthebes/gemma-3-270M-Instruct-FoodExtract" # replace with the published repo
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(
MODEL,
dtype="auto",
device_map="auto",
attn_implementation="sdpa",
)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [{
"role": "user",
"content": "a photo of a person's lunch with a tuna, cheese and capers melt sandwich",
}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
output = pipe(prompt, max_new_tokens=256)
print(output[0]["generated_text"])
A Gradio demo is also provided in the training notebook for interactive use.
Training Details
Training Data
- Source:
mrdbourke/FoodExtract-1k - Size: ~1,420 rows total
- Schema: each row contains a
sequencefield (input text) and agpt-oss-120b-label/gpt-oss-120b-label-condensedfield (structured food/drink annotations produced bygpt-oss-120b) - Splits used: 80% train / 10% validation / 10% test, produced via two
train_test_splitcalls withshuffle=Falseto preserve order
Training Procedure
Preprocessing
Each sample is converted into a chat-style conversation:
{
"messages": [
{"role": "user", "content": sample["sequence"]},
{"role": "assistant", "content": sample["gpt-oss-120b-label-condensed"]},
]
}
The tokenizer's built-in Gemma chat template (<start_of_turn>user … <end_of_turn> <start_of_turn>model …) is then applied during training so that the model
learns to produce the structured response after the model turn marker.
Targets use TOON instead of raw JSON. TOON encodes the same structure with substantially fewer tokens, which both reduces training cost and shortens generation at inference time.
Training Hyperparameters
Configured via trl.SFTConfig and trained with trl.SFTTrainer:
| Parameter | Value |
|---|---|
| Base model | google/gemma-3-270m-it |
| Epochs | 3 |
| Per-device train batch size | 16 |
| Per-device eval batch size | 16 |
| Learning rate | 5e-5 |
| LR scheduler | constant |
| Optimizer | adamw_torch_fused |
| Max sequence length | 512 |
| Packing | False |
| Gradient checkpointing | True |
| Save / eval strategy | per epoch |
| Attention implementation | sdpa |
| Logging | trackio |
- Training regime: model weights loaded with
dtype="auto"(bf16 on supported hardware) viaAutoModelForCausalLM.from_pretrained(...).
Evaluation
Testing Data, Factors & Metrics
Testing Data
The held-out 10% test split of mrdbourke/FoodExtract-1k, which was not seen
during training or hyperparameter selection.
Factors
Evaluated qualitatively across:
- food-only vs. drink-only vs. mixed inputs
- short captions vs. longer descriptive sentences
- inputs containing no food/drink (negative cases)
Metrics
Currently human review of generated outputs against the ground-truth
labels, comparing the fine-tuned model side-by-side with the base
gemma-3-270m-it. An LLM-as-judge approach is planned as a follow-up.
Results
Qualitatively, the fine-tuned model:
- Produces well-formed TOON outputs matching the training schema, where the base instruction model produces free-form prose and ignores the expected format.
- Correctly identifies food/drink presence in most short captions.
- Mild overfitting is visible in the loss curve (training loss continues to fall while validation loss flattens), which for this narrow structured-output task is treated as acceptable.
Summary
For its size (~270M parameters), the model produces consistent, parseable structured output for food/drink extraction, which the base instruction model does not do reliably without heavier prompting.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: (add — e.g. single NVIDIA RTX/A-series GPU)
- Hours used: (add)
- Cloud Provider: (add or "local" if trained on-prem)
- Compute Region: (add)
- Carbon Emitted: (add)
Technical Specifications
Model Architecture and Objective
Decoder-only transformer (Gemma 3, 270M parameters), trained with a causal language modelling objective on chat-formatted (user → assistant) examples where the assistant turn contains the TOON-encoded food/drink extraction.
Compute Infrastructure
Hardware
Nvidia DGX Spark x 1
Software
transformerstrl(SFTTrainer,SFTConfig)datasetstorch(CUDA)trackiofor run loggingtoon-formatfor compact structured outputsgradiofor the interactive demo
Citation
Base model:
@misc{gemma3_2025,
title = {Gemma 3},
author = {Google},
year = {2025},
url = {https://huggingface.co/google/gemma-3-270m-it}
}
More Information
Built as a learning exercise following Daniel Bourke's two-part fine-tuning tutorial, with adaptations including the use of TOON instead of JSON for structured outputs.
Model Card Authors
Michael Peres
Model Card Contact
- Downloads last month
- 29