Spaces:

LegendDeep
/

Gemma-3-270M

Sleeping

App Files Files Community

Gemma-3-270M / README.md

unknown

Fixed the model optimzation speed

e5ba726 24 days ago

preview code

raw

history blame contribute delete

1.18 kB

metadata

title: Gemma 3 270M Text Generation API
emoji: 🤖
colorFrom: purple
colorTo: indigo
sdk: docker
sdk_version: 0.0.1
app_file: app.py
pinned: false

Gemma 3 270M FastAPI Inference

This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.

This Space hosts the Google Gemma 3 270M model behind a FastAPI backend with:

⚡ High Performance: llama-cpp-python with GGUF model format for faster inference
🔒 Rate Limiting: IP-based request throttling
🎛️ Flexible Input: Support for both chat messages and direct prompts
📊 Monitoring: Built-in health checks and metrics
🚀 Production Ready: Comprehensive error handling and logging
🔧 Configurable: Environment-based configuration with CPU/GPU support
🐳 Docker Support: Ready-to-deploy containerization
💾 Memory Efficient: GGUF quantized models for reduced memory usage

Usage

Health Check

curl https://<your-space>/health