Instructions to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="makiisthebes/gemma-3-270M-Instruct-FoodExtract")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("makiisthebes/gemma-3-270M-Instruct-FoodExtract")
model = AutoModelForCausalLM.from_pretrained("makiisthebes/gemma-3-270M-Instruct-FoodExtract")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "makiisthebes/gemma-3-270M-Instruct-FoodExtract"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/makiisthebes/gemma-3-270M-Instruct-FoodExtract

SGLang

How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "makiisthebes/gemma-3-270M-Instruct-FoodExtract" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "makiisthebes/gemma-3-270M-Instruct-FoodExtract" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "makiisthebes/gemma-3-270M-Instruct-FoodExtract",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use makiisthebes/gemma-3-270M-Instruct-FoodExtract with Docker Model Runner:
```
docker model run hf.co/makiisthebes/gemma-3-270M-Instruct-FoodExtract
```

Model Card for Gemma-3-270m-it Food Extraction

A small, task-specific language model fine-tuned to extract food and drink items from arbitrary text (such as image captions) and return them in a compact TOON structured format.

Model Details

Model Description

This model is a Supervised Fine-Tuned (SFT) version of google/gemma-3-270m-it, trained to perform a single, narrow task: given a piece of free-form text, classify whether it mentions food/drink and, if so, extract the specific items as structured data. The motivation was to produce a cheap, locally-runnable model that can filter or label large volumes of image captions without paying for a frontier API or sending data off-device.

Developed by: Michael (personal learning project)
Model type: Decoder-only causal language model (Gemma 3 architecture)
Language(s) (NLP): English
License: Gemma Terms of Use
Finetuned from model: google/gemma-3-270m-it

Model Sources

Demo: Gradio app — see "How to Get Started" below
Tutorials this work follows:
- Daniel Bourke — Fine-tuning Part 1
- Daniel Bourke — Fine-tuning Part 2

Uses

Direct Use

The model takes a free-form English sentence and returns a TOON-encoded object describing whether the text mentions food or drink, and what specifically is mentioned. Typical inputs are image captions or short descriptive snippets, e.g.:

"For breakfast I had eggs, bacon and toast and a glass of orange juice."

Downstream Use

Suitable as a lightweight filter or labeller in a larger pipeline — for example, filtering a large dataset of image captions down to only those that reference food, before passing the filtered set to a more expensive downstream system (e.g. a recommendation engine or food-tracking app).

Out-of-Scope Use

General-purpose chat or open-ended question answering. The model has been specialised for one extraction task and will perform worse than the base Gemma instruction model on broad conversational tasks.
Languages other than English. Training data is English-only.
Safety-critical decisions (e.g. medical or allergen advice). Output should not be relied on for nutritional, dietary, or allergen information.
Inputs much longer than the training sequence length (512 tokens).

Bias, Risks, and Limitations

Dataset specificity. Training data (mrdbourke/FoodExtract-1k, ~1,420 rows) skews toward Western foods and image-caption-style phrasing. The model will likely under-perform on cuisines, dishes, or phrasings that are underrepresented in that distribution.
Mild overfitting. Training and validation loss curves diverge in the later epochs. For this narrow extraction task that is acceptable — and arguably desirable, since we want consistent structured output — but it does mean the model is unlikely to generalise gracefully to tasks outside food/drink extraction.
Label noise inheritance. Targets were generated by gpt-oss-120b, so any systematic errors in the teacher labels will be inherited by this model.
Format brittleness. Outputs use TOON formatting. Downstream consumers must parse TOON; malformed outputs are possible and should be guarded against.

Recommendations

Validate outputs with a parser before downstream use, and treat the model as a filter/heuristic rather than a source of truth for nutritional or allergen information.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

MODEL = "makiisthebes/gemma-3-270M-Instruct-FoodExtract"  # replace with the published repo

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    dtype="auto",
    device_map="auto",
    attn_implementation="sdpa",
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [{
    "role": "user",
    "content": "a photo of a person's lunch with a tuna, cheese and capers melt sandwich",
}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
output = pipe(prompt, max_new_tokens=256)
print(output[0]["generated_text"])

A Gradio demo is also provided in the training notebook for interactive use.

Training Details

Training Data

Source: mrdbourke/FoodExtract-1k
Size: ~1,420 rows total
Schema: each row contains a sequence field (input text) and a gpt-oss-120b-label / gpt-oss-120b-label-condensed field (structured food/drink annotations produced by gpt-oss-120b)
Splits used: 80% train / 10% validation / 10% test, produced via two train_test_split calls with shuffle=False to preserve order

Training Procedure

Preprocessing

Each sample is converted into a chat-style conversation:

{
    "messages": [
        {"role": "user", "content": sample["sequence"]},
        {"role": "assistant", "content": sample["gpt-oss-120b-label-condensed"]},
    ]
}

The tokenizer's built-in Gemma chat template (<start_of_turn>user … <end_of_turn> <start_of_turn>model …) is then applied during training so that the model learns to produce the structured response after the model turn marker.

Targets use TOON instead of raw JSON. TOON encodes the same structure with substantially fewer tokens, which both reduces training cost and shortens generation at inference time.

Training Hyperparameters

Configured via trl.SFTConfig and trained with trl.SFTTrainer:

Parameter	Value
Base model	`google/gemma-3-270m-it`
Epochs	3
Per-device train batch size	16
Per-device eval batch size	16
Learning rate	5e-5
LR scheduler	constant
Optimizer	`adamw_torch_fused`
Max sequence length	512
Packing	`False`
Gradient checkpointing	`True`
Save / eval strategy	per epoch
Attention implementation	`sdpa`
Logging	`trackio`

Training regime: model weights loaded with dtype="auto" (bf16 on supported hardware) via AutoModelForCausalLM.from_pretrained(...).

Evaluation

Testing Data, Factors & Metrics

Testing Data

The held-out 10% test split of mrdbourke/FoodExtract-1k, which was not seen during training or hyperparameter selection.

Factors

Evaluated qualitatively across:

food-only vs. drink-only vs. mixed inputs
short captions vs. longer descriptive sentences
inputs containing no food/drink (negative cases)

Metrics

Currently human review of generated outputs against the ground-truth labels, comparing the fine-tuned model side-by-side with the base gemma-3-270m-it. An LLM-as-judge approach is planned as a follow-up.

Results

Qualitatively, the fine-tuned model:

Produces well-formed TOON outputs matching the training schema, where the base instruction model produces free-form prose and ignores the expected format.
Correctly identifies food/drink presence in most short captions.
Mild overfitting is visible in the loss curve (training loss continues to fall while validation loss flattens), which for this narrow structured-output task is treated as acceptable.

Summary

For its size (~270M parameters), the model produces consistent, parseable structured output for food/drink extraction, which the base instruction model does not do reliably without heavier prompting.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: (add — e.g. single NVIDIA RTX/A-series GPU)
Hours used: (add)
Cloud Provider: (add or "local" if trained on-prem)
Compute Region: (add)
Carbon Emitted: (add)

Technical Specifications

Model Architecture and Objective

Decoder-only transformer (Gemma 3, 270M parameters), trained with a causal language modelling objective on chat-formatted (user → assistant) examples where the assistant turn contains the TOON-encoded food/drink extraction.

Compute Infrastructure

Hardware

Nvidia DGX Spark x 1

Software

transformers
trl (SFTTrainer, SFTConfig)
datasets
torch (CUDA)
trackio for run logging
toon-format for compact structured outputs
gradio for the interactive demo

Citation

Base model:

@misc{gemma3_2025,
  title  = {Gemma 3},
  author = {Google},
  year   = {2025},
  url    = {https://huggingface.co/google/gemma-3-270m-it}
}

More Information

Built as a learning exercise following Daniel Bourke's two-part fine-tuning tutorial, with adaptations including the use of TOON instead of JSON for structured outputs.