Instructions to use Truthseeker87/solarhive-26b-a4b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Truthseeker87/solarhive-26b-a4b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/content/drive/MyDrive/models/gemma-4/transformers/gemma-4-26b-a4b-it/1") model = PeftModel.from_pretrained(base_model, "Truthseeker87/solarhive-26b-a4b-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use Truthseeker87/solarhive-26b-a4b-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Truthseeker87/solarhive-26b-a4b-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Truthseeker87/solarhive-26b-a4b-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Truthseeker87/solarhive-26b-a4b-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Truthseeker87/solarhive-26b-a4b-lora", max_seq_length=2048, )
- SolarHive 26B A4B LoRA — Community Solar Energy Intelligence
- Model Overview
- Benchmark Results
- Precision Note — BF16 is Gemma 4's Native Release Format
- Training Details
- How to Use
- Core Capabilities
- Live Demo Output
- Data Pipeline Diagnostics
- Community Model
- Companion Repositories
- Versions — v2 Update (Text-Only Training on the Multimodal-Capable Corpus)
- Technical Notes
- Limitations
- Future Iteration — Multi-Token Prediction (MTP) Drafters
- Citation
- Links
SolarHive 26B A4B LoRA — Community Solar Energy Intelligence
LoRA fine-tuned adapters for Gemma 4 26B A4B, specialized in community solar energy management with native function calling and multimodal visual question answering.
Built for the Gemma 4 Good Hackathon (Google DeepMind x Kaggle).
| Base Model | google/gemma-4-26b-a4b-it |
| Architecture | MoE — 25.2B total, 3.8B active (8/128 experts) |
| Fine-Tuning | LoRA via Unsloth (BF16) |
| Training Data | 1,727 examples (solarhive-community-solar-multimodal) — text-only fine-tune (1,713 text + 14 image-grounded); VQA at inference uses the base Gemma 4 vision encoder (~550M params), unmodified by our LoRA per the Vertex AI SFT recipe |
| Converged Loss | 0.6956 |
| Benchmark | 9/10 (5/5 domain Q&A + 4/5 tool calling) + 3/3 When2Call — May 2026 final run, multi-call regression on TQ5 (see Multi-Variant Deployment Validation below) |
| Training Time | 7,198 seconds (~120 minutes) |
| Compute | Google Colab Pro |
| License | MIT (adapters) / Gemma Terms (base model) |
Model Overview
SolarHive is an AI energy advisor for community solar microgrids. It helps suburban neighborhoods collectively optimize distributed solar generation and shared battery storage through natural language conversation, visual inspection, and live data integration.
This is the cloud inference model. It powers the live demo with full multimodal VQA and native function calling. For edge deployment via Ollama (privacy-first, no internet required), see the companion SolarHive E4B Ollama.
This repository contains LoRA adapters only — you need the base Gemma 4 26B A4B model to use them. The adapters add domain expertise in solar energy, battery management, grid optimization, and community coordination while preserving the base model's general capabilities.
What These Adapters Add
- Domain expertise in solar production, battery management, grid pricing, panel inspection, and community energy coordination
- Improved function calling for four energy-specific tools (weather, solar production, battery state, grid status)
- Visual question answering for sky condition analysis, panel health inspection, and neighborhood aerial assessment
- Grounded responses that reference real data from live APIs rather than hallucinating numbers
Benchmark Results
Evaluated on held-out questions not seen during training:
Domain Q&A (5/5)
| Question | Result |
|---|---|
| "What happens to solar production when humidity exceeds 80%?" | Correct — explains water vapor absorption, scattering, 10-25% reduction |
| "At what battery SOC should we stop exporting to the grid?" | Correct — references MISO region rates, dynamic export optimization |
| "Home #3 has been underperforming by 22% for three weeks. Diagnostic checklist?" | Correct — systematic diagnostic (visual, shading, electrical, performance) |
| "Winter in Ann Arbor, panels have snow. Prioritize actions." | Correct — snow clearing, safety, timing, 50-90% loss estimate |
| "Grid frequency dropped to 59.8 Hz. What does that mean for our microgrid?" | Correct — generation deficit, stability implications, operational guidance |
Tool Calling (1/3)
| Question | Expected Tool | Called | Status |
|---|---|---|---|
| "What's the current battery state?" | get_battery_state |
Direct answer | Fail |
| "Solar production in Seattle?" | get_solar_production or get_weather |
Direct answer | Fail |
| "General maintenance tips for panels?" | None (no tool needed) | None | Pass |
Note: The isolated benchmark (single-turn) scores 8/8. In the full agentic loop, the model also scores 8/8 — see below.
Production Benchmark (8/8) — Inference Agentic Loop
When evaluated using generate_with_tools() with tool schemas in context, the model scores 8/8 (5/5 Q&A + 3/3 tool calling):
Q&A (5/5) — same questions, same correct answers as above.
Tool Calling (3/3):
| Question | Expected | Called | Status |
|---|---|---|---|
| "What's the current battery state?" | get_battery_state |
get_battery_state |
Pass |
| "Current weather in Ann Arbor and how does it affect solar production?" | get_weather |
get_weather |
Pass |
| "General maintenance tips for panels?" | None | None | Pass |
The difference: the agentic loop passes tool schemas via apply_chat_template(tools=[...]), giving the model the function signatures it was trained on. The isolated benchmark tests raw generation without tool context.
Multi-Variant Deployment Validation (Final Run, May 2026)
The 26B A4B LoRA + base is the baseline of the multi-variant comparison. Score on the 10-question parity benchmark (5 Q&A + 5 tool):
Score: 5/5 Q&A + 4/5 tool = 9/10
The single FAIL is the lenient multi-call probe — "Compare today's irradiance forecast across Ann Arbor, Phoenix, and Seattle" (min_calls=2) — where this A4B LoRA returned no tool call. 4 of 5 ran variants share the same multi-call failure mode; only the E4B LoRA + base variant chained the multi-city calls (3 × get_weather). Pattern is reproducible across runs — systematic, not stochastic.
Inference-time When2Call Validation — A4B LoRA scores 3/3 (directly measured)
Three held-out probes from Ross et al. (2025), When2Call: When (not) to Call Tools, arXiv:2504.18851. The paper documents 9–67% tool-hallucination rates on (c)+(d) in untrained community models. The A4B LoRA passes all three probes (3/3, directly measured in the May 2026 inference run), confirming that the SolarHive fine-tune — which includes 16 explicit unable-to-answer + follow-up clarification examples following the When2Call taxonomy — handles refusal + follow-up behaviors correctly:
| Probe | Question | A4B LoRA behavior |
|---|---|---|
| (b) Tool routing | "What's the current grid rate?" | ✅ Calls get_grid_status |
| (c) Follow-up question | "How much will a 10 kW array produce today?" | ✅ Asks for location instead of auto-filling Ann Arbor |
| (d) Refuse + redirect | "What's the current air quality index in Ann Arbor?" | ✅ Explicit disclaimer: "I don't have a dedicated air quality tool, but I can check the weather…" |
Compare to the E4B family (solarhive-e4b-lora and solarhive-e4b-ollama)
which both score 2/3 on the same probes (pass (b)+(d), fail (c) by auto-filling location instead of asking back). The +1/3 W2C delta between the A4B family (3/3 across LoRA + merged + NF4, all measured) and E4B family (2/3 across LoRA + merged) is the empirical signature of size-vs-refusal scaling. A4B outperforming E4B on these reasoning-heavy probes was the pre-stated hypothesis going in, not a discovery — per the official Google Gemma 4 Core docs
"Parameter sizes and quantization" section: "Models with higher
parameters and bit counts (higher precision) are generally more capable,
but are more expensive to run." This 26B A4B variant accesses ~25B total
knowledge capacity (3.8B active per token via MoE sparsity) and a ~550M
vision encoder — vs E4B's 8B total / 4.5B effective / ~150M vision
encoder. The When2Call paper documents the same size-vs-refusal scaling
empirically. A4B is the right deployment target for under-specified or
out-of-scope queries; E4B handles the well-specified-routing volume at
the edge.
Quantitative reinforcement from Unsloth's published Gemma 4 benchmarks:
| Benchmark | 26B A4B | E4B | A4B − E4B gap |
|---|---|---|---|
| MMLU Pro | 82.6% | 69.4% | +13.2 pts |
| MMMU Pro | 73.8% | 52.6% | +21.2 pts |
| AIME 2026 | 88.3% | 42.5% | +45.8 pts |
| LiveCodeBench v6 | 77.1% | 52.0% | +25.1 pts |
The 45.8 pp AIME gap (math reasoning) + 21 pp MMMU Pro gap predict the SolarHive When2Call (c)/(d) regression directly — refusal/follow-up behavior is a reasoning task, and the published reasoning-benchmark delta scales cleanly into the 2-of-3 behavioral regression we observed.
Precision Note — BF16 is Gemma 4's Native Release Format
This repository contains LoRA adapter weights only — apply them on top of
Google's open-source google/gemma-4-26b-a4b-it
base via Unsloth's FastVisionModel.from_pretrained(...) at inference time.
Both the base model and the adapters are in BF16, which is Gemma 4's native
release precision — there is no FP32 release to begin with, so applying
BF16 LoRA over a BF16 base is not a quantization downgrade; it is the same
numerical precision Google published.
| Variant | Precision | Repository | Use case |
|---|---|---|---|
| This repo — LoRA adapters | BF16 (~2 GB adapter weights) | solarhive-26b-a4b-lora |
Apply over base at runtime; smallest download; needs Unsloth |
| Pre-merged BF16 weights | BF16 (~48 GB full model) | solarhive-26b-a4b-merged | from_pretrained(...) directly; no PEFT/Unsloth dep |
| NF4 quantized | 4-bit packed (~48 GB) | solarhive-26b-a4b-nf4 | HF Spaces / 24 GB+ GPU deployment |
All three variants are derived from the same fine-tuning run; the LoRA delta in this repo is the canonical source. The merged and NF4 variants exist for deployment convenience.
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Method | LoRA via Unsloth FastVisionModel (BF16, RTX PRO 6000 Blackwell 102 GB) |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0 |
| Target modules | All linear layers |
| Learning rate | 2e-4 |
| Optimizer | AdamW 8-bit |
| Warmup steps | 5 |
| Epochs | 3 |
| Max sequence length | 2048 |
| Precision | BF16 |
| Seed | 3407 |
| Trainable parameters | 505.4M / 26.3B (1.92%) |
Training Data — 1,727 Examples
The canonical training corpus is solarhive-community-solar-multimodal — 1,727 rows (1,713 text + 14 image-grounded). The full hand-crafted portion is preserved verbatim in solarhive_datagen.py Cell 7a (LEGACY_DATA + LEGACY_TOOL_CALL_DATA), and the API-grounded portion is reproducible at training time via _fetch_api_examples().
Three complementary sources ensure both breadth and depth:
413 hand-crafted Q&A spanning 15+ US cities and 9 energy domains:
- Sky conditions and cloud impact on production
- Battery management and charge/discharge strategy
- Panel health diagnostics and maintenance
- Consumption optimization and load shifting
- Community and grid coordination strategy
- Emergency resilience and outage planning
- Seasonal planning and weather adaptation
- Multi-step reasoning across multiple data sources
- Alternative storage (fuel cells, thermal)
~1,117 API-grounded Q&A generated from live data:
- Open-Meteo (GHI, DNI, DHI, low/mid/high cloud cover), PVWatts, OpenWeatherMap, EIA APIs
- Joined on
(location, hourly timestamp)so each multi-source example carries co-occurring grounding - Locations: Ann Arbor, MI and San Mateo, CA
- Every numeric claim traces back to a real API response
183 tool-calling examples trained with the When2Call taxonomy — 106 should-call, 53 should-not-call, 10 unable-to-answer, 6 follow-up clarification, 8 failure-recovery — so the model learns when to call tools, when to refuse politely, when to admit a tool can't answer, and when to ask a clarifying question.
14 image-grounded Q&A turns from 7 manually-labeled Ann Arbor sky photographs — cloud type and percentage cloud cover are human-confirmed, expected production traces back to the cloud-cover label via the same temperature-derated GHI formula.
Training Loss
| Metric | Value |
|---|---|
| Converged loss (last 20 steps) | 0.6956 |
| Final step loss | 0.727 |
| Minimum loss | 0.357 |
| Total steps | 645 |
| Training time | 7,198 seconds |
Hardware
- GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition (102 GB GDDR7 total, 94.97 GB max usable per Unsloth)
- Platform: Google Colab Pro (G4 VM)
- Precision: BF16 (no quantization during training)
How to Use
Loading with Unsloth (Recommended)
Standard PEFT cannot handle Gemma 4's Gemma4ClippableLinear layers. Use Unsloth's FastVisionModel for reliable adapter loading:
from unsloth import FastVisionModel
import torch
# Load base model + LoRA adapters
model, processor = FastVisionModel.from_pretrained(
model_name="google/gemma-4-26b-a4b-it",
adapter_name="Truthseeker87/solarhive-26b-a4b-lora", # This repo
dtype=torch.bfloat16,
device_map="auto",
)
FastVisionModel.for_inference(model)
Two-Step Tokenization (Required)
Single-step apply_chat_template(tokenize=True) crashes in transformers 5.5.x on messages without a "content" key (e.g., tool_calls messages). Use this two-step pattern:
messages = [
{"role": "system", "content": "You are SolarHive, an AI energy advisor..."},
{"role": "user", "content": "How will today's weather affect our solar production?"},
]
# Step 1: render text (tokenize=False)
text = processor.apply_chat_template(
messages, tools=tools,
add_generation_prompt=True,
enable_thinking=False,
tokenize=False,
)
# Step 2: tokenize separately
inputs = processor(text=text, images=None, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
response = processor.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
Native Function Calling
Define tools as Python functions with Google-style docstrings. Gemma 4 autonomously decides which to invoke:
def get_weather(location: str) -> dict:
"""Get current weather conditions for a location.
Args:
location: City name, e.g. 'Ann Arbor, MI'
Returns:
dict with temp_f, clouds_pct, wind_mph, humidity, sunrise, sunset
"""
# Your API call here
...
def get_solar_production(clouds_pct: int, temp_f: float) -> dict:
"""Get estimated community solar production using GHI irradiance data.
Args:
clouds_pct: Cloud cover percentage (0-100)
temp_f: Temperature in Fahrenheit
Returns:
dict with production_kw, capacity_kw, efficiency_pct, ghi_wm2
"""
...
tools = [get_weather, get_solar_production, get_battery_state, get_grid_status]
text = processor.apply_chat_template(
messages, tools=tools,
add_generation_prompt=True,
enable_thinking=False,
tokenize=False,
)
The model emits tool calls as call:fn_name{arg: "value"} in its output, parsed via regex r'call:(\w+)\{([^}]*)\}'.
Core Capabilities
1. Multimodal Visual Question Answering (3 Modes)
| Mode | Input | Output |
|---|---|---|
| Sky Analysis | Sky photograph | Cloud coverage %, production forecast, storage recommendation |
| Panel Inspection | Panel photograph | Dirt/damage/shading detection, efficiency impact estimate |
| Neighborhood Assessment | Aerial/satellite image | Panel inventory, expansion priorities, shading analysis |
2. Native Function Calling (5 Tools — all 3 keyed APIs wired)
| Tool | API | Returns |
|---|---|---|
get_weather(location) |
OpenWeatherMap (OWM_API_KEY) |
Temperature, clouds %, wind, humidity, sunrise/sunset |
get_solar_production(clouds_pct, temp_f) |
Open-Meteo GHI (keyless) | Production kW, efficiency %, GHI W/m², temp derating |
get_battery_state() |
Community BMS (sim) | State of charge, capacity, charging status |
get_grid_status() |
EIA Open Data (EIA_API_KEY) |
Pricing period, rate/kWh, renewable %, CO2 intensity |
get_nrel_pvwatts_baseline() |
NREL PVWatts v8 (NREL_API_KEY) |
Annual + current-month typical kWh + avg kW for the 72 kW array |
Tool results feed back as a 2-message sequence matching the training distribution:
{"role": "assistant", "tool_calls": [{"function": {"name": ..., "arguments": ...}}, ...]}
{"role": "tool", "name": "<fn>", "content": json.dumps(result)} # one per tool call
This format is shared across solarhive_datagen.py (training-data generation), solarhive_finetune.py (SFT preprocessing + schema validation), solarhive_inference.py Cell 4, and test_ollama_tools.py Solution B — inference matches the training distribution exactly.
3. Selective Tool Reasoning
The model reasons about which tools are relevant — not blindly calling everything:
"What time does peak pricing start?"
→ Calls: get_grid_status() only
"Is today's production above typical for January?"
→ Calls: get_solar_production() + get_nrel_pvwatts_baseline()
"Should I run my pool heater now?"
→ Calls: get_weather() + get_solar_production() + get_battery_state() + get_grid_status()
"What are general maintenance tips?"
→ Calls: none (answers from training knowledge)
4. Inference-time When2Call Validation
Three held-out probes validate coverage of 3 of the 4 failure-mode categories from Ross, H., Mahabaleshwarkar, A. S., & Suhara, Y. (2025). When2Call: When (not) to Call Tools. arXiv:2504.18851. The paper documents 9–67% tool-hallucination rates in untrained community models on (c) and (d):
| Category | Probe | Expected behavior |
|---|---|---|
| (b) | "What's the current grid rate?" | Correct tool call (get_grid_status) — well-specified, in-scope |
| (c) | "How much will a 10 kW array produce today?" | Follow-up question (asks for location) — does NOT auto-fill Ann Arbor |
| (d) | "What's the current air quality index in Ann Arbor?" | Polite refusal + redirect (e.g., airnow.gov) — does NOT hallucinate a tool |
Models trained without explicit unable-to-answer and follow-up clarification examples typically fail (c) + (d). The SolarHive corpus includes 16 such examples (10 unable-to-answer + 6 follow-up clarification) following the When2Call taxonomy; this A4B LoRA passes all 3 probes (3/3, directly measured in the May 2026 inference run).
Live Demo Output
Five interactive queries from the April 15, 2026 demo capture with real sky photos from Ann Arbor:
Query 1 — Sky photo uploaded (VQA Mode 1)
Q: "How is current cloud cover affecting potential household solar production?"
A: "The image shows heavy, uniform overcast conditions with thick gray cloud cover obscuring the sun completely. This type of overcast sky typically reduces solar production to 10-25% of clear-sky capacity. At this moment, expect roughly 10-25 kW community output from your 72 kW array. This is a good time to conserve battery for evening peak hours and defer non-essential loads."
Query 2 — Tool calling (solar production)
Q: "What was estimated solar production at 9.00am Eastern time?"
Tools called:
get_solar_productionA: "At 9:00am in Ann Arbor, MI, the community produced 40.4 kW out of 72 kW capacity (56.1% efficiency). GHI was 660.0 W/m² with 10% cloud cover and no temperature derating at 65°F. This is a solid early morning output — production will ramp up toward the noon peak."
Query 3 — Sky photo + energy planning
Q: "How should I plan my household solar production and energy storage for the next two hours?"
Tools called: none (direct visual answer)
A: "The image shows heavy, broken cumulus and stratocumulus clouds covering approximately 80-90% of the sky. Production will be highly variable — cycling between 15-30% of capacity during cloud gaps and dropping to 5-10% under the thicker gray patches. If your battery is below 50%, prioritize essential loads only."
Query 4 — Tool calling (grid status)
Q: "What is current grid capacity now?"
Tools called:
get_grid_statusA: "Peak pricing at $0.28/kWh with 30.3% renewable grid mix. CO2 intensity is 381.5 kg/MWh. Peak rates typically run 2-7pm — prioritize battery discharge and minimize grid imports."
Query 5 — Tool calling (weather)
Q: "What is the weather forecast going to be like this week in Ann Arbor?"
Tools called:
get_weatherA: "Current conditions in Ann Arbor: 67°F with 100% cloud cover and mist. With 100% cloud cover, expect significantly reduced solar output — run only essential loads and preserve battery for evening peak."
Data Pipeline Diagnostics
Training data quality validated with 14 diagnostic charts generated from live API data:
Solar Irradiance and Production
Environmental Correlations
Cross-Validation and Grid Analysis
Atmospheric Decomposition
Community Model
| Parameter | Value |
|---|---|
| Location | Ann Arbor, Michigan (42.2808°N, 83.7430°W) |
| Community size | 12 homes |
| Total panel capacity | 72 kW |
| Shared battery storage | 100 kWh |
| Grid region | MISO (Midcontinent Independent System Operator) |
Companion Repositories
| Model | Repository | Purpose |
|---|---|---|
| SolarHive 26B A4B LoRA | This repo | Cloud inference — LoRA adapters via Unsloth, full multimodal + function calling |
| SolarHive 26B A4B Merged | solarhive-26b-a4b-merged | Full BF16 merged weights (~48 GB) — LoRA pre-applied to base, no PEFT/Unsloth needed at inference |
| SolarHive 26B A4B NF4 | solarhive-26b-a4b-nf4 | Pre-quantized 4-bit version of the BF16 merged model — for HF Spaces and 24 GB+ GPUs |
| SolarHive E4B LoRA | solarhive-e4b-lora | E4B adapter weights (~200 MB) — apply over base via Unsloth |
| SolarHive E4B safetensors | solarhive-e4b-ollama | Edge model — merged safetensors source for transformers research and GGUF conversion via llama.cpp |
| SolarHive E4B GGUF | solarhive-e4b-gguf | Edge deployment — Q4_K_M GGUF + mmproj for Ollama / llama.cpp on 16 GB CPU laptop (10/10 benchmark) |
| SolarHive Dataset | solarhive-community-solar-multimodal | 1,727 training examples (1,713 text + 14 image-grounded) |
| LiteRT-LM Python edge runtime | solarhive_e4b_litert_v3.1.ipynb |
LiteRT Special Tech Track entry — runs upstream base litert-community/gemma-4-E4B-it-litert-lm .litertlm (3.66 GB) + SolarHive UX layer + on-device agentic loop with native Gemma 4 function calling. Q&A 8/8 on Colab Pro CPU + High-RAM. Fine-tuned LiteRT-LM bundle is a planned next iteration once upstream gemma4 example module lands in ai_edge_torch.generative.examples/. |
| GitHub | the-gemma4-good-hackathon-solarhive | Full source code, training and quantization notebooks, test_ollama_tools.py, data principles |
Versions — v2 Update (Text-Only Training on the Multimodal-Capable Corpus)
The repository was refreshed on April 30, 2026 with a v2 LoRA produced by re-training on the consolidated training corpus (1,727 rows = 1,713 text + 14 image-grounded). The v2 fine-tune trains on the text subset only; image rows are skipped at the data-prep layer. Multimodal training is deferred post-hackathon — a real image corpus and a held-out VQA benchmark would be prerequisites. The base Gemma 4 26B A4B model retains full multimodal capability regardless of which corpus subset is used in any given fine-tune run, so VQA at inference time continues to work on the saved adapters.
The shipped dataset uses the project archive only — fewer images than originally planned, but every label is human-confirmed and every paired Q&A traces back to the same GHI / temperature derating formula used elsewhere. Image-source planning had earlier rejected the SWIM corpora (NUS — CC BY-NC licensing) and NREL SRRL (legacy MIDC SkyCam archive ended May 2017).
The v2 fine-tune is pre-aligned with the official Unsloth Gemma 4
documentation (train guide,
bug fixes & tips):
explicit loader arguments (max_seq_length, dtype, full_finetuning=False),
explicit SFTConfig arguments (weight_decay, lr_scheduler_type),
text-only data path (finetune_vision_layers=False,
dataset_text_field="text", TRL default text collator,
train_on_responses_only wrapper for assistant-only loss masking).
Technical Notes
- System prompt repetition: The system prompt is repeated twice in the message format. This technique improves instruction following in causal LLMs, winning 47/70 benchmark-model tests with zero losses (Leviathan et al., 2024, Google Research).
- PEFT incompatibility: Standard PEFT cannot handle Gemma 4's
Gemma4ClippableLinearlayers. Use Unsloth'sFastVisionModelfor adapter loading. - VRAM requirements: ~48 GB in BF16, ~16 GB in NF4 (4-bit). T4 x2 cannot run this model.
- Sampling:
temperature=1.0, top_p=0.95, top_k=64(Kaggle-recommended defaults).
Limitations
- Prototype tested on a single community model (12 homes, Ann Arbor). Real-world deployment requires validation across diverse geographies and community sizes.
- The model occasionally uses "60 kW" instead of the correct 72 kW community capacity in direct VQA responses (without tool calls). This is a base model tendency that additional fine-tuning examples will address.
- Tool responses depend on external API availability. Open-Meteo and EIA have rate limits. OpenWeatherMap free tier allows 1,000 calls/day.
- The battery state simulator is deterministic for demonstrations. Real deployment requires integration with actual battery management systems.
Future Iteration — Multi-Token Prediction (MTP) Drafters
Not in the measured numbers above. Google announced Gemma 4 MTP drafters on May 5, 2026 (blog, overview, HF collection, Kaggle, @GoogleGemma) — after this artifact's final benchmark was captured. The benchmarks above reflect standard autoregressive decoding only. MTP integration is documented here as future iteration; no measured speedup is claimed in this release.
Theoretical foundation. Speculative decoding (Leviathan, Kalman & Matias, ICML 2023, arXiv:2211.17192) accelerates generation without changing the output distribution under argmax decoding: a smaller drafter proposes γ candidate tokens, the target verifies all γ in a single parallel forward pass, accepted tokens are kept, and any rejection is resampled from a corrected distribution. The output distribution is preserved exactly regardless of drafter quality; only acceptance rate α, and therefore walltime speedup, varies.
What Google released on May 5, 2026. Paired drafter checkpoints for all four IT-tuned Gemma 4 variants — gemma-4-E2B-it-assistant, gemma-4-E4B-it-assistant, gemma-4-26B-A4B-it-assistant, gemma-4-31B-it-assistant — discoverable via the google/gemma-4 Hugging Face collection and on Kaggle Models. The drafters share the input embedding table with their paired target and consume the target's last-layer activations (architecture per the MTP overview). For this target the paired drafter is google/gemma-4-26B-A4B-it-assistant (0.4 B params). Google reports up to 3× decode speedup with no quality degradation on the 26B-A4B configuration, and **2.2×** on Apple Silicon at batch sizes 4–8. Tested runtimes named in the blog: LiteRT-LM, MLX, Hugging Face Transformers, vLLM, SGLang, Ollama.
Integration cost is one kwarg in Hugging Face Transformers:
target_base = AutoModelForCausalLM.from_pretrained("google/gemma-4-26B-A4B-it", dtype=torch.bfloat16, ...)
target = FastVisionModel.from_pretrained("Truthseeker87/solarhive-26b-a4b-lora", ...) # apply LoRA on top
assistant = AutoModelForCausalLM.from_pretrained("google/gemma-4-26B-A4B-it-assistant", dtype=torch.bfloat16, ...)
target.generate(**inputs, assistant_model=assistant) # MTP enabled
The integration ships as a gated future-iteration cell (§14, _RUN_MTP_DEMO = False) in solarhive_inference.py; reviewers can flip the flag to reproduce a baseline-vs-MTP comparison under argmax decoding.
Open question specific to this LoRA-adapter target. Per the 2023 speculative-sampling guarantee, correctness is invariant to drafter quality — the target's verification step preserves the exact output distribution regardless of what the drafter proposes. What varies is acceptance rate α, since Google's released drafter was trained against the base gemma-4-26B-A4B-it, not against this LoRA-adapter-on-top target. Measured α and the resulting walltime speedup on this target are the planned post-hackathon contribution.
Citation
@misc{solarhive2026,
title={SolarHive: AI-Powered Community Solar Energy Intelligence},
author={Youshen Lim},
year={2026},
url={https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive},
note={Gemma 4 Good Hackathon submission — Google DeepMind x Kaggle}
}
Links
- GitHub: youshen-lim/the-gemma4-good-hackathon-solarhive
- Kaggle: The Gemma 4 Good Hackathon
- Base Model: google/gemma-4-26b-a4b-it
Built with Gemma 4 in Ann Arbor, Michigan. May 2026.
Gemma is a trademark of Google LLC.
- Downloads last month
- 21
Dataset used to train Truthseeker87/solarhive-26b-a4b-lora
Space using Truthseeker87/solarhive-26b-a4b-lora 1
Papers for Truthseeker87/solarhive-26b-a4b-lora
Prompt Repetition Improves Non-Reasoning LLMs
When2Call: When (not) to Call Tools
Fast Inference from Transformers via Speculative Decoding
Evaluation results
- Accuracyself-reported1.000
- Accuracyself-reported1.000














