Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
More logging + timeout if model is not loaded + url not needed
Browse filesHey
@merve
🤗
Suggesting a couple of changes in how to use `huggingface_hub`. Feel free to ignore if you prefer the way it was:
1. Enable more logging using `logging.set_verbosity_info()` => this will help in case you need to debug something not work. Typically if the endpoint is not available.
2. Add `timeout=60` to the client => by default the `InferenceClient` will retry indefinitely until the model is ready. You might want to set a timeout here instead.
3. When using the InferenceAPI, no need to paste the full url. Setting `model="meta-llama/Llama-2-7b-chat-hf"` is enough (pasting the full URL is not wrong btw)
app.py
CHANGED
@@ -1,12 +1,15 @@
|
|
1 |
import gradio as gr
|
2 |
from huggingface_hub import InferenceClient
|
|
|
3 |
import os
|
4 |
|
|
|
|
|
5 |
token = os.getenv("TOKEN")
|
6 |
endpoint = os.getenv("ENDPOINT")
|
7 |
|
8 |
# initialize InferenceClient
|
9 |
-
client = InferenceClient(model="
|
10 |
|
11 |
# query client using streaming mode
|
12 |
def inference(message, history):
|
|
|
1 |
import gradio as gr
|
2 |
from huggingface_hub import InferenceClient
|
3 |
+
from huggingface_hub import logging
|
4 |
import os
|
5 |
|
6 |
+
logging.set_verbosity_info()
|
7 |
+
|
8 |
token = os.getenv("TOKEN")
|
9 |
endpoint = os.getenv("ENDPOINT")
|
10 |
|
11 |
# initialize InferenceClient
|
12 |
+
client = InferenceClient(model="meta-llama/Llama-2-7b-chat-hf", timeout=60, token=token)
|
13 |
|
14 |
# query client using streaming mode
|
15 |
def inference(message, history):
|