Spaces:
Sleeping
Sleeping
File size: 1,260 Bytes
41adc9e a87f9b7 41adc9e fa01d80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
title: InferenceProxy
emoji: 💾
colorFrom: blue
colorTo: pink
sdk: docker
pinned: false
app_port: 4040
---
# inference-proxy
Lightweight proxy to store LLM traces in a Hugging Face Dataset.
### How it works
This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables:
- `BATCH_SIZE_LIMIT` - the maximum batch size before pushing to dataset
- `BATCH_TIME_LIMIT` - the amount of time before pushing to dataset
### Required Environment Variables
- `HF_ACCESS_TOKEN` - HF Access Token
- `USER_NAME` - Used to ensure we only process requests from the user
### Example
```js
import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4040/fireworks-ai/inference/v1",
apiKey: process.env.HF_API_KEY,
});
let out = "";
const stream = await client.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v3",
messages: [
{
role: "user",
content: "What is the capital of France?",
},
],
stream: true,
max_tokens: 500,
});
for await (const chunk of stream) {
if (chunk.choices && chunk.choices.length > 0) {
const newContent = chunk.choices[0].delta.content;
out += newContent;
console.log(newContent);
}
}
```
|