🎮 Gemma 4 Vision Game Bot

A fully local, vision-only AI game bot. It sees the screen, thinks via a local LLM, and controls mouse/keyboard — zero memory reading, zero cloud APIs.

🏗️ Architecture

  Browser (localhost:7860)        llama-server (localhost:8080)
  ┌─────────────────────┐        ┌──────────────────────────┐
  │   Gradio GUI        │        │   Gemma 4 (GGUF)         │
  │   Start/Stop/Pause  │        │   Screenshot → JSON      │
  │   Live Screenshot   │◄──────►│   decision               │
  │   Live Logs         │  HTTP  │                          │
  │   Stats Dashboard   │        │   + mmproj (vision)      │
  └─────────────────────┘        └──────────────────────────┘
           │                                │
           ▼                                ▼
  ┌─────────────────────┐        ┌──────────────────────────┐
  │   Screen Capture    │        │   Action Executor        │
  │   mss / PyAutoGUI   │        │   xdotool / PyAutoGUI    │
  │   Window-only mode  │        │   Safety bounds          │
  └─────────────────────┘        │   Human-like delays      │
                                 └──────────────────────────┘

⚡ Quick Start (Ubuntu)

git clone https://huggingface.co/belal611/gemma4-vision-gamebot
cd gemma4-vision-gamebot
chmod +x setup_gamebot.sh && ./setup_gamebot.sh

# Terminal 1: Start model server
cd ~/game-bot && ./start_server.sh

# Terminal 2: Start GUI
cd ~/game-bot && ./start_gui.sh
# → Opens http://127.0.0.1:7860

💻 Hardware Tiers — From Weakest to Strongest

Every tier includes vision (image understanding). The mmproj file (~940 MB for E2B/E4B, ~1.1 GB for A4B/31B) is always required and is always FP16 — it cannot be quantized.

🟢 Tier 1 — Ultra-Low (4 GB RAM)


Target hardware	Raspberry Pi 4/5 (4 GB), old netbooks, thin clients
Model	Gemma 4 E2B — IQ2_M
GGUF size	2.4 GB
Total RAM needed	~3.8 GB (model + mmproj + overhead)
Speed (4 cores)	~1–3 tok/s → 20–40s per decision
Quality	⭐⭐ — Degraded but functional for simple tasks
Download	`huggingface-cli download bartowski/google_gemma-4-E2B-it-GGUF google_gemma-4-E2B-it-IQ2_M.gguf mmproj-google_gemma-4-E2B-it-f16.gguf --local-dir ./models/`

🟡 Tier 2 — Budget (8 GB RAM)


Target hardware	Old desktops (i3/i5 Gen2–4), basic laptops, 8 GB mini PCs
Model	Gemma 4 E2B — Q4_K_M ⭐ Recommended
GGUF size	3.2 GB
Total RAM needed	~5.5 GB
Speed (4 cores)	~2–5 tok/s → 8–20s per decision
Quality	⭐⭐⭐ — Good for strategy/idle games
Verdict	Best value. Handles Clash of Clans perfectly.
Download	`huggingface-cli download bartowski/google_gemma-4-E2B-it-GGUF google_gemma-4-E2B-it-Q4_K_M.gguf mmproj-google_gemma-4-E2B-it-f16.gguf --local-dir ./models/`

Also viable at this tier:

Quant	Size	Notes
E2B Q3_K_M	3.0 GB	Slightly worse, saves 200 MB
E2B Q5_K_M	3.4 GB	Slightly better, costs 200 MB
E2B Q8_0	4.6 GB	Near-lossless, tight fit at 8 GB

🔵 Tier 3 — Mainstream (16 GB RAM)


Target hardware	Modern desktops/laptops, HP Z230 (the original target), M1 MacBook Air
Model	Gemma 4 E4B — Q4_K_M
GGUF size	5.0 GB
Total RAM needed	~7.5 GB
Speed (8 cores)	~3–7 tok/s → 5–12s per decision
Quality	⭐⭐⭐⭐ — Noticeably smarter than E2B
Verdict	Sweet spot. Runs the game + model + GUI comfortably.
Download	`huggingface-cli download bartowski/google_gemma-4-E4B-it-GGUF google_gemma-4-E4B-it-Q4_K_M.gguf mmproj-google_gemma-4-E4B-it-f16.gguf --local-dir ./models/`

Also viable at this tier:

Quant	Size	Notes
E4B Q3_K_M	4.6 GB	Saves 400 MB RAM
E4B Q6_K	5.9 GB	High quality
E4B Q8_0	7.5 GB	Near-lossless, ~10 GB total
E2B Q8_0	4.6 GB	If you want faster speed over smarts

🟣 Tier 4 — Power User (32 GB RAM)


Target hardware	Gaming PCs, workstations, 32 GB laptops, M2/M3 MacBook Pro
Model	Gemma 4 26B-A4B — Q4_K_M (MoE, only 4B active params)
GGUF size	15.9 GB
Total RAM needed	~19 GB
Speed (8 cores)	~2–5 tok/s → 8–18s per decision
Quality	⭐⭐⭐⭐⭐ — Dramatically better reasoning, spatial understanding
Why MoE?	26B total params but only 4B active per token — fast like a 4B, smart like a 26B
Download	`huggingface-cli download bartowski/google_gemma-4-26B-A4B-it-GGUF google_gemma-4-26B-A4B-it-Q4_K_M.gguf mmproj-google_gemma-4-26B-A4B-it-f16.gguf --local-dir ./models/`

Also viable at this tier:

Quant	Size	Notes
A4B IQ3_M	12.4 GB	Fits tight 32 GB with game running
A4B Q3_K_M	12.1 GB	Good balance
A4B Q6_K	21.3 GB	Premium quality, ~24 GB total

🔴 Tier 5 — Enthusiast (64 GB+ RAM or GPU)


Target hardware	64 GB workstations, M4 Max, or any NVIDIA GPU (8+ GB VRAM)
Model	Gemma 4 31B — Q4_K_M (dense, full 31B)
GGUF size	18.3 GB
Total RAM needed	~22 GB
Speed (CPU 16 cores)	~1–3 tok/s → 12–30s per decision
Speed (RTX 3060 12GB)	~15–25 tok/s → 2–5s per decision 🚀
Quality	⭐⭐⭐⭐⭐+ — Best available. Near-GPT-4o-mini level vision
Download	`huggingface-cli download bartowski/google_gemma-4-31B-it-GGUF google_gemma-4-31B-it-Q4_K_M.gguf mmproj-google_gemma-4-31B-it-f16.gguf --local-dir ./models/`

GPU offloading (any tier with NVIDIA GPU):

# Offload layers to GPU for huge speedup
./llama-server -m model.gguf --mmproj mmproj.gguf -ngl 99 ...

📊 Summary Table

Tier	RAM	Model	GGUF	Total RAM	Speed	Quality
🟢 Ultra-Low	4 GB	E2B IQ2_M	2.4 GB	~3.8 GB	20–40s	⭐⭐
🟡 Budget	8 GB	E2B Q4_K_M	3.2 GB	~5.5 GB	8–20s	⭐⭐⭐
🔵 Mainstream	16 GB	E4B Q4_K_M	5.0 GB	~7.5 GB	5–12s	⭐⭐⭐⭐
🟣 Power	32 GB	A4B Q4_K_M	15.9 GB	~19 GB	8–18s	⭐⭐⭐⭐⭐
🔴 Enthusiast	64 GB+	31B Q4_K_M	18.3 GB	~22 GB	2–30s*	⭐⭐⭐⭐⭐+

*GPU offloading dramatically reduces latency

🛡️ Safety & Robustness Features (v2)

Feature	Description
Safety Bounds	Coordinates clamped to 0–896. Window-only mode prevents clicking outside game.
Robust JSON Parser	Multi-stage: full text → bracket extraction → candidate testing. Never crashes on bad output.
Structured Output	GBNF grammar forces llama.cpp to generate valid JSON only — 90% fewer parse errors.
Motion Detection	Compares screenshots via image hashing. Skips LLM call if screen unchanged — saves CPU.
Error Recovery	Detects stuck screens (3x identical) and repeated actions (4x same). Auto-presses Escape.
Window-Only Capture	Optional: captures only the game window via xdotool, ignoring notifications/desktop.
Thread Crash Protection	Bot thread wrapped in try/except — crashes are logged, not silent.
Human-Like Behavior	Random ±3px offset on clicks, random delays between actions.

🎯 Supported Games & Tasks

Clash of Clans

Task	Description
`collect_resources`	Click full gold mines & elixir collectors
`train_army`	Navigate to barracks, train troops
`attack_farm`	Find weak base, deploy troops
`upgrade_buildings`	Use idle builders on priority upgrades
`donate_troops`	Fulfill clan donation requests
`clear_obstacles`	Remove trees, rocks, gem boxes
`daily_routine`	All of the above in sequence

Silkroad Online

Task	Description
`auto_hunt`	Kill monsters, loot drops
`quest`	Follow quest markers, talk to NPCs
`trade`	Buy/sell at market
`level_up`	Grind XP, use potions

📁 Files

File	Description
⭐ `game_bot_gui.py`	Full GUI control panel (1271 lines) — Gradio web interface
`game_bot.py`	CLI version for advanced users
`setup_gamebot.sh`	One-script installer (builds llama.cpp + downloads model)
`gui_mockup.png`	Visual preview of the GUI

🔧 Changing the Model

Edit start_server.sh and point to your chosen GGUF files:

./llama-server \
    -m ~/models/YOUR_MODEL.gguf \
    --mmproj ~/models/YOUR_MMPROJ.gguf \
    --host 127.0.0.1 --port 8080 \
    --ctx-size 2048 -t $(nproc) --temp 0.1

Add -ngl 99 if you have an NVIDIA GPU.

📜 License

Apache 2.0 — Model (Gemma 4) and code.

🔗 Model Sources

Model	Source
E2B GGUF	bartowski/google_gemma-4-E2B-it-GGUF
E4B GGUF	bartowski/google_gemma-4-E4B-it-GGUF
A4B GGUF	bartowski/google_gemma-4-26B-A4B-it-GGUF
31B GGUF	bartowski/google_gemma-4-31B-it-GGUF

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support