Instructions to use YUGOROU/quiz-main-gemma-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YUGOROU/quiz-main-gemma-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="YUGOROU/quiz-main-gemma-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("YUGOROU/quiz-main-gemma-merged") model = AutoModelForMultimodalLM.from_pretrained("YUGOROU/quiz-main-gemma-merged") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use YUGOROU/quiz-main-gemma-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "YUGOROU/quiz-main-gemma-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YUGOROU/quiz-main-gemma-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/YUGOROU/quiz-main-gemma-merged
- SGLang
How to use YUGOROU/quiz-main-gemma-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "YUGOROU/quiz-main-gemma-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YUGOROU/quiz-main-gemma-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "YUGOROU/quiz-main-gemma-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "YUGOROU/quiz-main-gemma-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use YUGOROU/quiz-main-gemma-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for YUGOROU/quiz-main-gemma-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for YUGOROU/quiz-main-gemma-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for YUGOROU/quiz-main-gemma-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="YUGOROU/quiz-main-gemma-merged", max_seq_length=2048, ) - Docker Model Runner
How to use YUGOROU/quiz-main-gemma-merged with Docker Model Runner:
docker model run hf.co/YUGOROU/quiz-main-gemma-merged
quiz-main-gemma-merged ⚡ — 早押しクイズ 回答モデル (Answering model)
The answering model of a two-model Japanese competitive buzz-quiz (早押しクイズ) system.
Given a partial question (the prefix read so far at buzz time), it reasons inside
<think>…</think> and emits a short answer.
- 🕹️ Live demo (HF Space): https://huggingface.co/spaces/build-small-hackathon/quiz-buzzer-ai
- 💻 Code (GitHub): https://github.com/YUGOROU/quiz-ai
- 🔔 Buzz-timing companion model:
YUGOROU/quiz-buzz-reg-1.2bjp-merged
Role in the system
| Model | Job | |
|---|---|---|
| 🔔 Buzz | YUGOROU/quiz-buzz-reg-1.2bjp-merged (LFM2.5-1.2B + regression head) |
Reads the question char-by-char, buzzes when conf ≥ θ (~9 ms/char). |
| 🧠 Answer (this model) | gemma-4-26B-A4B SFT | From the partial question at buzz time, <think>…</think> reasoning → answer. |
Total ≈ 27.2B params (≤ 32B), built for the HF Build Small Hackathon.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo = "YUGOROU/quiz-main-gemma-merged"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, device_map="auto")
prefix = "日本の首都は東京ですが、アメリカの首都は" # partial question at buzz time
msgs = [{"role": "user", "content": f"早押しクイズ({len(prefix)}文字目時点):\n{prefix}"}]
ids = tok.apply_chat_template(msgs, enable_thinking=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(
ids,
max_new_tokens=320,
do_sample=False,
eos_token_id=[1, 106], # gemma-4 closes the turn with <turn|>=106, not only <eos>=1
)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
# <think> … </think>ワシントンD.C.
Important: gemma-4 ends an assistant turn with
<turn|>(id 106). If you only stop on<eos>(id 1) the model will keep hallucinating new turns. Always include 106 in your stop set (vLLM:--stop-token-ids 1 106).<think>reasoning is required — disabling it collapses accuracy.
Training
- Base:
unsloth/gemma-4-26B-A4B(MoE, 26B total / 4B active),gemma-4-thinkingchat template. - SFT (Unsloth bf16 LoRA, merged to 16-bit) on a quiz-grammar corpus built from AI王 / JAQKET:
user = partial question at the statistically-decidable buzz position (S-buzz), assistant =
<think>{reasoning}</think>{answer}with adaptive think budget by difficulty. - Full-question QA ≈ 76%; at the buzz position ≈ 62–74% depending on threshold θ (later buzz → higher accuracy).
Attribution & license
This model is a fine-tune of Google Gemma 4, which Google releases under the Apache License 2.0. The model weights are therefore distributed under Apache 2.0.
Training data derived from AI王 (Project AIO) / JAQKET. Quiz questions © abc/EQIDEN実行委員会 / 株式会社キュービック / クイズ法人カプリティオ. Non-commercial research use only. No dataset redistribution — only model weights and inference code are released.
- Downloads last month
- 148