Text Generation
Transformers
Safetensors
English
gemma3_text
gemma-3
binary-classification
medical
plain-vs-technical
causal-lm
conversational
text-generation-inference
Instructions to use Cristian11212/gemma3-1b-plaintech-combined-20250921-021021 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Cristian11212/gemma3-1b-plaintech-combined-20250921-021021 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Cristian11212/gemma3-1b-plaintech-combined-20250921-021021") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Cristian11212/gemma3-1b-plaintech-combined-20250921-021021") model = AutoModelForMultimodalLM.from_pretrained("Cristian11212/gemma3-1b-plaintech-combined-20250921-021021") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Cristian11212/gemma3-1b-plaintech-combined-20250921-021021 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Cristian11212/gemma3-1b-plaintech-combined-20250921-021021
- SGLang
How to use Cristian11212/gemma3-1b-plaintech-combined-20250921-021021 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Cristian11212/gemma3-1b-plaintech-combined-20250921-021021 with Docker Model Runner:
docker model run hf.co/Cristian11212/gemma3-1b-plaintech-combined-20250921-021021
gemma3-1b-plaintech-combined-20250921-021021
Fine-tune de Gemma-3 270M (IT) para clasificar texto en 0 = plain (llano) y 1 = technical (jerga médica).
Nota: es un causal LM entrenado para responder un solo dígito (0/1). Usa el chat template de Gemma y (opcionalmente) restringe la generación a {0,1}.
Datos
- Dataset original del proyecto (train/val/test).
- Muestra externa cochrane_sample (90% a train, 10% a val).
Métricas
| split | accuracy | precision | recall | f1 | roc_auc |
|---|---|---|---|---|---|
| val | 0.9282 | 0.8708 | 0.9873 | 0.9254 | 0.9532 |
| split | accuracy | precision | recall | f1 | roc_auc |
|---|---|---|---|---|---|
| test | 0.8450 | 0.7431 | 0.9727 | 0.8425 | 0.8718 |
Uso
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation.logits_process import LogitsProcessor, LogitsProcessorList
repo_id = "Cristian11212/gemma3-1b-plaintech-combined-20250921-021021"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, torch_dtype=torch.float16 if torch.cuda.is_available() else None)
model.eval().to("cuda" if torch.cuda.is_available() else "cpu")
SYS = "You are a binary classifier. Reply with a single digit: 0 (plain) or 1 (technical). No extra text."
def build_prompt(text):
msgs = [{"role":"system","content":SYS}, {"role":"user","content":text}]
return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
zero_id = tokenizer.encode("0", add_special_tokens=False)[0]
one_id = tokenizer.encode("1", add_special_tokens=False)[0]
class OnlyAllowTokens(LogitsProcessor):
def __init__(self, allowed, device): self.allowed = torch.tensor(allowed, device=device)
def __call__(self, input_ids, scores):
mask = torch.full_like(scores, float("-inf")); mask[:, self.allowed] = 0.0; return scores + mask
lp = LogitsProcessorList([OnlyAllowTokens([zero_id, one_id], model.device)])
text = "urinary incontinence is the inability to willingly control bladder voiding..."
enc = tokenizer([build_prompt(text)], return_tensors="pt").to(model.device)
out = model.generate(**enc, max_new_tokens=1, do_sample=False, output_scores=True, return_dict_in_generate=True, logits_processor=lp)
logits = out.scores[0].float(); probs = torch.softmax(logits, dim=-1)
p1 = probs[0, one_id].item(); pred = 1 if p1 >= 0.5 else 0
print("pred:", pred, "prob_technical:", p1)
- Downloads last month
- -