Instructions to use felipecmarins/gemma3-4b-algebra4-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use felipecmarins/gemma3-4b-algebra4-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="felipecmarins/gemma3-4b-algebra4-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("felipecmarins/gemma3-4b-algebra4-merged") model = AutoModelForImageTextToText.from_pretrained("felipecmarins/gemma3-4b-algebra4-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use felipecmarins/gemma3-4b-algebra4-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "felipecmarins/gemma3-4b-algebra4-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felipecmarins/gemma3-4b-algebra4-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/felipecmarins/gemma3-4b-algebra4-merged
- SGLang
How to use felipecmarins/gemma3-4b-algebra4-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "felipecmarins/gemma3-4b-algebra4-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felipecmarins/gemma3-4b-algebra4-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "felipecmarins/gemma3-4b-algebra4-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "felipecmarins/gemma3-4b-algebra4-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use felipecmarins/gemma3-4b-algebra4-merged with Docker Model Runner:
docker model run hf.co/felipecmarins/gemma3-4b-algebra4-merged
gemma3-4b-algebra4-merged — Gemma 3 4B fine-tuned em Álgebra Linear
Modelo merged (base + LoRA) em BF16. Para uso com
transformers, fine-tune adicional, ou conversão para outros formatos. Para uso direto em celular, prefira ogemma3-4b-algebra4-gguf.
O que é
Resultado de aplicar peft.merge_and_unload() sobre google/gemma-3-4b-it + o adapter LoRA treinado no algebra4-mix. Os pesos estão em BF16, prontos para inferência via transformers ou para serem requantizados em outro formato (AWQ, GPTQ, EXL2, GGUF, LiteRT).
Uso com transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
mid = "felipecmarins/gemma3-4b-algebra4-merged"
tok = AutoTokenizer.from_pretrained(mid)
model = AutoModelForCausalLM.from_pretrained(mid, torch_dtype=torch.bfloat16, device_map="auto")
prompt = tok.apply_chat_template(
[{"role": "user", "content": "Encontre os autovalores de A = [[2,1],[1,2]]."}],
tokenize=False, add_generation_prompt=True,
)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=400, do_sample=False)
print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True))
Como foi produzido
| Etapa | Comando · resultado |
|---|---|
| 1. Treino QLoRA | transformers 5.9 + peft 0.19 + trl 1.4 + bnb 0.49 em L4 24 GB; NF4 + LoRA r=16 |
| 2. Merge LoRA | peft.merge_and_unload() em CPU (device_map="cpu", low_cpu_mem_usage=True) |
| 3. Save | save_pretrained(..., safe_serialization=True, max_shard_size="5GB") → 2 shards |
Tamanho final em disco: 8.1 GB (model-00001-of-00002 = 4.7 GB · model-00002-of-00002 = 3.4 GB).
Treinamento — sumário
- Mix: 50 k samples (de 495 k disponíveis) do
algebra4-mix - Hiperparâmetros:
lr=1e-4cosine · warmup 3 % · batch 1 × grad acc 32 · max_seq 1024 · 1 epoch - Hardware: NVIDIA L4 24 GB em Mumbai (asia-south1-b) — única zona com estoque L4 no momento por demanda global
- Duração: ~14 h ·
train_loss = 0.74final - Custo: ~US$ 12 (créditos GCP gratuitos)
Componentes relacionados
| Repo | Conteúdo |
|---|---|
felipecmarins/gemma3-4b-algebra4-merged |
Este — BF16 merged |
felipecmarins/gemma3-4b-algebra4-lora |
Apenas adapters (65 MB) — para quem prefere aplicar dinâmico em runtime |
felipecmarins/gemma3-4b-algebra4-gguf |
Q4_0 (2.3 GB) — para celular / mobile |
Licença
Gemma Terms of Use.
- Downloads last month
- 24