Wauplin HF staff commited on
Commit
dc44908
1 Parent(s): b451ff3

More logging + timeout if model is not loaded + url not needed

Browse files

Hey

@merve

🤗

Suggesting a couple of changes in how to use `huggingface_hub`. Feel free to ignore if you prefer the way it was:
1. Enable more logging using `logging.set_verbosity_info()` => this will help in case you need to debug something not work. Typically if the endpoint is not available.
2. Add `timeout=60` to the client => by default the `InferenceClient` will retry indefinitely until the model is ready. You might want to set a timeout here instead.
3. When using the InferenceAPI, no need to paste the full url. Setting `model="meta-llama/Llama-2-7b-chat-hf"` is enough (pasting the full URL is not wrong btw)

Files changed (1) hide show
  1. app.py +4 -1
app.py CHANGED
@@ -1,12 +1,15 @@
1
  import gradio as gr
2
  from huggingface_hub import InferenceClient
 
3
  import os
4
 
 
 
5
  token = os.getenv("TOKEN")
6
  endpoint = os.getenv("ENDPOINT")
7
 
8
  # initialize InferenceClient
9
- client = InferenceClient(model="https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b-chat-hf", token=token)
10
 
11
  # query client using streaming mode
12
  def inference(message, history):
 
1
  import gradio as gr
2
  from huggingface_hub import InferenceClient
3
+ from huggingface_hub import logging
4
  import os
5
 
6
+ logging.set_verbosity_info()
7
+
8
  token = os.getenv("TOKEN")
9
  endpoint = os.getenv("ENDPOINT")
10
 
11
  # initialize InferenceClient
12
+ client = InferenceClient(model="meta-llama/Llama-2-7b-chat-hf", timeout=60, token=token)
13
 
14
  # query client using streaming mode
15
  def inference(message, history):