Hugging Face Token Setup - Working Models
β Current Configuration
Model Selected: facebook/blenderbot-400M-distill
Why this model:
- β Publicly available (no gating required)
- β Works with HF Inference API
- β Text generation task
- β No special permissions needed
- β Fast response times
- β Stable and reliable
Fallback: gpt2 (guaranteed to work on HF API)
Setting Up Your HF Token
Step 1: Get Your Token
- Go to https://huggingface.co/settings/tokens
- Click "New token"
- Name it: "Research Assistant"
- Set role: Read (this is sufficient for inference)
- Generate token
- Copy it immediately (won't show again)
Step 2: Add to Hugging Face Space
In your HF Space settings:
- Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
- Click "Settings" (gear icon)
- Under "Repository secrets" or "Space secrets"
- Add new secret:
- Name:
HF_TOKEN - Value: (paste your token)
- Name:
- Save
Step 3: Verify Token Works
The code will automatically:
- β
Load token from environment:
os.getenv('HF_TOKEN') - β Use it in API calls
- β Log success/failure
Check logs for:
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XXX)
Alternative Models (Tested & Working)
If you want to try different models:
Option 1: GPT-2 (Very Reliable)
"model_id": "gpt2"
- β‘ Fast
- β Always available
- β οΈ Simple responses
Option 2: Flan-T5 Large (Better Quality)
"model_id": "google/flan-t5-large"
- π Better quality
- β‘ Fast
- β Public access
Option 3: Blenderbot (Conversational)
"model_id": "facebook/blenderbot-400M-distill"
- π¬ Good for conversation
- β Current selection
- β‘ Fast
Option 4: DistilGPT-2 (Faster)
"model_id": "distilgpt2"
- β‘ Very fast
- β Guaranteed available
- β οΈ Smaller, less capable
How the System Works Now
API Call Flow:
- User question β Synthesis Agent
- Synthesis Agent β Tries LLM call
- LLM Router β Calls HF Inference API with token
- HF API β Returns generated text
- System β Uses real LLM response β
No More Fallbacks
- β No knowledge base fallback
- β No template responses
- β Always uses real LLM when available
- β GPT-2 fallback if model loading (503 error)
Verification
Test Your Setup:
Ask: "What is 2+2?"
Expected: Real LLM generated response (not template)
Check logs for:
llm_router - INFO - Calling HF API for model: facebook/blenderbot-400M-distill
llm_router - INFO - HF API returned response (length: XX)
src.agents.synthesis_agent - INFO - RESP_SYNTH_001 received LLM response
If You See 401 Error:
HF API error: 401 - Unauthorized
Fix: Token not set correctly in HF Space settings
If You See 404 Error:
HF API error: 404 - Not Found
Fix: Model ID not valid (very unlikely with current models)
If You See 503 Error:
Model loading (503), trying fallback
Fix: First-time model load, automatically retries with GPT-2
Current Models in Config
File: models_config.py
"reasoning_primary": {
"model_id": "facebook/blenderbot-400M-distill",
"max_tokens": 500,
"temperature": 0.7
}
Performance Notes
Latency:
- Blenderbot: ~2-4 seconds
- GPT-2: ~1-2 seconds
- Flan-T5: ~3-5 seconds
Quality:
- Blenderbot: Good for conversational responses
- GPT-2: Basic but coherent
- Flan-T5: More factual, less conversational
Troubleshooting
Token Not Working?
- Verify in HF Dashboard β Settings β Access Tokens
- Check it has "Read" permissions
- Regenerate if needed
- Update in Space settings
Model Not Loading?
- First request may take 10-30 seconds (cold start)
- Subsequent requests are faster
- 503 errors auto-retry with fallback
Still Seeing Placeholders?
- Restart your Space
- Check logs for HF API calls
- Verify token is in environment
Next Steps
- β Add token to HF Space settings
- β Restart Space
- β Test with a question
- β Check logs for "HF API returned response"
- β Enjoy real LLM responses!
Summary
Model: facebook/blenderbot-400M-distill
Fallback: gpt2
Status: β
Configured and ready
Requirement: Valid HF token in Space settings
No fallbacks: System always tries real LLM first