File size: 8,180 Bytes
edff8f9
 
 
 
 
 
 
 
 
557227d
edff8f9
 
 
 
bfeeedb
7c84ff6
bfeeedb
8b41055
bfeeedb
 
 
 
 
 
 
 
 
 
 
 
 
 
8b41055
 
 
bfeeedb
8b41055
 
 
bfeeedb
 
 
 
 
 
 
 
8b41055
 
 
 
 
bfeeedb
8b41055
bfeeedb
8b41055
 
 
bfeeedb
 
8b41055
bfeeedb
8b41055
 
bfeeedb
 
 
8b41055
 
 
bfeeedb
 
 
 
 
 
 
 
 
 
 
 
 
 
8b41055
 
bfeeedb
 
 
 
 
 
 
8b41055
 
 
bfeeedb
 
 
 
 
 
 
 
 
 
 
8b41055
230d561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
557227d
230d561
 
 
557227d
 
 
 
 
 
 
 
 
 
230d561
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b41055
bfeeedb
8b41055
bfeeedb
 
 
 
8b41055
bfeeedb
 
 
 
 
 
 
 
 
8b41055
 
 
bfeeedb
 
 
 
 
557227d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4302ebf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
---
language: en
license: mit
library_name: diffusers
tags:
  - stable-diffusion-xl
  - text-to-image
  - diffusers
  - huggingface-inference-endpoints
  - custom-inference
pipeline_tag: text-to-image
inference: true
---

# Hugging Face Inference API for Stable Diffusion XL

This repository contains a text-to-image generation API designed to be deployed on Hugging Face Inference Endpoints, using Stable Diffusion XL models for image generation.

## Features

- Compatible with Hugging Face Inference Endpoints
- Stable Diffusion XL (SDXL) model for high-quality image generation
- Content filtering for safe image generation
- Configurable image dimensions (default: 1024x768)
- Base64-encoded image output
- Performance optimizations (torch.compile, attention processors)

## Project Structure

The codebase has been simplified to only use a single file:

- `handler.py`: Contains the `EndpointHandler` class that implements the Hugging Face Inference Endpoints interface. This file also includes a built-in FastAPI server for local development.

## Configuration

The service is configured via the `app.conf` JSON file with the following parameters:

```json
{
  "model_id": "your-huggingface-model-id",
  "prompt": "template with {prompt} placeholder",
  "negative_prompt": "default negative prompt",
  "inference_steps": 30,
  "guidance_scale": 7,
  "use_safetensors": true,
  "width": 1024,
  "height": 768
}
```

## API Usage

### Hugging Face Inference Endpoints Format

When deployed to Hugging Face Inference Endpoints, the API accepts requests in the following format:

```json
{
  "inputs": "your prompt here",
  "parameters": {
    "negative_prompt": "optional negative prompt",
    "seed": 12345,
    "inference_steps": 30,
    "guidance_scale": 7,
    "width": 1024,
    "height": 768
  }
}
```

Response format:
```json
[
  {
    "generated_image": "base64-encoded-image",
    "seed": 12345
  }
]
```

### Local Development Format

When running locally, you can use the same format as above, or a simplified format:

```json
{
  "prompt": "your prompt here",
  "negative_prompt": "optional negative prompt",
  "seed": 12345,
  "inference_steps": 30,
  "guidance_scale": 7,
  "width": 1024,
  "height": 768
}
```

Response format from the local server:
```json
[
  {
    "generated_image": "base64-encoded-image",
    "seed": 12345
  }
]
```

## Deployment on Hugging Face Inference Endpoints

### Step 1: Push this repository to Hugging Face Hub

1. Create a new repository on Hugging Face Hub:
   ```bash
   huggingface-cli repo create your-repo-name
   ```

2. Add the Hugging Face repository as a remote:
   ```bash
   git remote add huggingface https://huggingface.co/username/your-repo-name
   ```

3. Push your code to the Hugging Face repository:
   ```bash
   git push huggingface your-branch:main
   ```

### Step 2: Create an Inference Endpoint

1. Go to your repository on Hugging Face Hub: https://huggingface.co/username/your-repo-name
2. Click on "Deploy" in the top menu, then select "Inference Endpoints"
3. Click "Create a new endpoint"
4. Configure your endpoint with the following settings:
   - Name: Give your endpoint a name
   - Region: Choose a region close to your users (e.g., us-east-1)
   - Instance Type: Choose a GPU instance (recommended: at least 16GB VRAM for SDXL)
   - Replicas: Start with 1 replica
   - Autoscaling: Configure as needed

   **IMPORTANT: IF YOU SEE THIS WARNING**:
   > "Warning: deploying this model will probably fail because the model's Diffusers pipeline is not set"
   
   5. Click "Continue anyway" - this is expected because you're using a custom handler implementation
   6. Under Advanced configuration:
      - Make sure "Framework" is set to "Custom"
      - Configure "Task" as "Text-to-Image"
   
7. Click "Create endpoint"

The Hugging Face Inference Endpoints service will automatically detect and use your `EndpointHandler` class in the `handler.py` file.

### Step 3: Test your Inference Endpoint

Once deployed, you can test your endpoint using:

```python
import requests
import json
import base64
from PIL import Image
import io

# Your Hugging Face API token and endpoint URL
API_TOKEN = "your-hugging-face-api-token"
API_URL = "https://api-inference.huggingface.co/models/username/your-repo-name"

# Headers for the request
headers = {
    "Authorization": f"Bearer {API_TOKEN}",
    "Content-Type": "application/json"
}

# Request payload
payload = {
    "inputs": "a beautiful landscape with mountains and a lake",
    "parameters": {
        "negative_prompt": "blurry, low quality",
        "seed": 42,
        "inference_steps": 30,
        "guidance_scale": 7
    }
}

# Send the request
response = requests.post(API_URL, headers=headers, json=payload)
result = response.json()

# Convert the base64-encoded image to a PIL Image
image_bytes = base64.b64decode(result[0]["generated_image"])
image = Image.open(io.BytesIO(image_bytes))
image.save("generated_image.jpg")
print(f"Image saved with seed: {result[0]['seed']}")
```

### Required Files

For deployment on Hugging Face Inference Endpoints, you need:
- `handler.py` - Contains the `EndpointHandler` class implementation
- `requirements.txt` - Lists the Python dependencies
- `app.conf` - Contains configuration parameters

Note: A `Procfile` is not needed for Hugging Face Inference Endpoints deployment, as the service automatically detects and uses the `EndpointHandler` class.

## Local Development

1. Install dependencies: `pip install -r requirements.txt`
2. Run the API locally: `python handler.py [--port PORT] [--host HOST]`
3. The API will be available at http://localhost:8000

The local server uses the FastAPI implementation included in `handler.py` that provides the same functionality as the Hugging Face Inference Endpoints interface.

## Environment Variables

- `PORT`: Port to run the server on (default: 8000)
- `USE_TORCH_COMPILE`: Set to "1" to enable torch.compile for performance (default: "0")

## License

This project is licensed under the terms of the MIT license.

## Testing Your Inference Endpoint

We've included a test script `test_endpoint.py` to help you test your deployed endpoint.

### Prerequisites
- Python 3.7+
- Your Hugging Face API token
- An active Hugging Face Inference Endpoint

### Installation
```bash
pip install requests pillow
```

### Usage
```bash
python test_endpoint.py --token "YOUR_HF_API_TOKEN" --url "YOUR_ENDPOINT_URL" --prompt "your test prompt here"
```

#### Additional Options
```
--negative_prompt TEXT     Negative prompt to guide generation
--seed INTEGER             Random seed for reproducibility
--steps INTEGER            Number of inference steps (default: 30)
--guidance FLOAT           Guidance scale (default: 7.0)
--width INTEGER            Image width (default: 1024)
--height INTEGER           Image height (default: 768)
--output_dir TEXT          Directory to save generated images (default: "generated_images")
```

#### Example
```bash
python test_endpoint.py \
  --token "hf_..." \
  --url "https://api-inference.huggingface.co/models/username/your-repo-name" \
  --prompt "beautiful sunset over mountains" \
  --negative_prompt "blurry, low quality" \
  --seed 42 \
  --steps 30 \
  --guidance 7.5
```

This will:
1. Send a request to your endpoint
2. Download the generated image
3. Save it to the specified output directory
4. Display the seed used for generation 

## Troubleshooting

### Error: "You are trying to load the model files of the `variant=fp16`, but no such modeling files are available"

If you encounter this error when deploying your endpoint, it means the model you're trying to use doesn't have an fp16 variant explicitly available. To fix this:

1. Open `handler.py`
2. Find the `StableDiffusionXLPipeline.from_pretrained` call
3. Remove the `variant="fp16"` parameter

The corrected code should look like:
```python
pipe = StableDiffusionXLPipeline.from_pretrained(
    ckpt_dir,
    vae=vae,
    torch_dtype=torch.float16,
    use_safetensors=self.cfg.get("use_safetensors", True)
)
```

This change allows the model to be loaded with fp16 precision without requiring a specific fp16 variant of the model weights.