mangubee Claude commited on
Commit
2bb2d3d
·
1 Parent(s): c86df49

Phase 0-2: HF Vision Integration Complete - Google Gemma 3 Selected

Browse files

Phase 0 (Extended Testing):
- Tested 4 additional user-requested models
- Validated: google/gemma-3-27b-it:scaleway (6s, RECOMMENDED)
- Validated: zai-org/GLM-4.6V-Flash:zai-org (16s)
- Validated: Qwen/Qwen3-VL-30B-A3B-Instruct:novita (14s)
- Failed: GLM-4.7, gpt-oss-120b (text-only models)

Phase 1 (Implementation):
- Added analyze_image_hf() in src/tools/vision.py
- Fixed analyze_image() routing to respect LLM_PROVIDER
- Added HF_TOKEN, HF_VISION_MODEL to settings
- Each provider fails independently (no fallback chains)

Phase 2 (Smoke Tests):
- Created test/test_smoke_hf_vision.py
- Smoke test PASSED: red square image correctly identified
- Fixed Settings.hf_token integration
- Removed unsupported timeout parameter

Modified Files:
- src/tools/vision.py: +120 lines (HF vision function + routing fix)
- src/config/settings.py: +5 lines (HF config)
- .env.example: +7 lines (HF_TOKEN, HF_VISION_MODEL docs)
- CHANGELOG.md: +93 lines (Phase 0-2 documentation)
- PLAN.md: +20 lines (Phase 0 results, Phase 1 updates)
- test/test_smoke_hf_vision.py: NEW (smoke test script)

Co-Authored-By: Claude <noreply@anthropic.com>

.env.example CHANGED
@@ -15,6 +15,13 @@ ANTHROPIC_API_KEY=your_anthropic_api_key_here
15
  # Free baseline alternative: Gemini 2.0 Flash
16
  GOOGLE_API_KEY=your_google_api_key_here
17
 
 
 
 
 
 
 
 
18
  # ============================================================================
19
  # Tool API Keys (Level 5 - Component Selection)
20
  # ============================================================================
 
15
  # Free baseline alternative: Gemini 2.0 Flash
16
  GOOGLE_API_KEY=your_google_api_key_here
17
 
18
+ # HuggingFace Inference API (for vision and text models)
19
+ HF_TOKEN=your_huggingface_token_here
20
+
21
+ # HuggingFace Vision Model (validated from Phase 0)
22
+ # Options: google/gemma-3-27b-it:scaleway (recommended), CohereLabs/aya-vision-32b
23
+ HF_VISION_MODEL=google/gemma-3-27b-it:scaleway
24
+
25
  # ============================================================================
26
  # Tool API Keys (Level 5 - Component Selection)
27
  # ============================================================================
CHANGELOG.md CHANGED
@@ -1,5 +1,98 @@
1
  # Session Changelog
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ## [2026-01-07] [Phase 0: API Validation] [COMPLETED] HF Inference Vision Support - GO Decision
4
 
5
  **Problem:** Needed to validate HF Inference API supports vision models before implementation.
 
1
  # Session Changelog
2
 
3
+ ## [2026-01-11] [Phase 2: Smoke Tests] [COMPLETED] HF Vision Validated - Ready for GAIA
4
+
5
+ **Problem:** Need to validate HF vision works before complex GAIA evaluation.
6
+
7
+ **Solution:** Single smoke test with simple red square image.
8
+
9
+ **Result:** ✅ PASSED
10
+ - Model: `google/gemma-3-27b-it:scaleway`
11
+ - Answer: "The image is a solid, uniform field of red color..."
12
+ - Provider routing: Working correctly
13
+ - Settings integration: Fixed
14
+
15
+ **Modified Files:**
16
+ - **src/config/settings.py** (~5 lines added)
17
+ - Added `HF_TOKEN` and `HF_VISION_MODEL` config
18
+ - Added `hf_token` and `hf_vision_model` to Settings class
19
+ - Updated `validate_api_keys()` to include huggingface
20
+ - **test/test_smoke_hf_vision.py** (NEW - ~50 lines)
21
+ - Simple smoke test script
22
+ - Tests basic image description
23
+
24
+ **Bug Fixes:**
25
+ - Removed unsupported `timeout` parameter from `chat_completion()`
26
+
27
+ **Next Steps:** Phase 3 - GAIA evaluation with HF vision
28
+
29
+ ---
30
+
31
+ ## [2026-01-11] [Phase 1: Implementation] [COMPLETED] HF Vision Integration - Routing Fixed
32
+
33
+ **Problem:** Vision tool hardcoded to Gemini → Claude, ignoring UI LLM selection.
34
+
35
+ **Solution:**
36
+ - Added `analyze_image_hf()` function using `google/gemma-3-27b-it:scaleway` (fastest, ~6s)
37
+ - Fixed `analyze_image()` routing to respect `LLM_PROVIDER` environment variable
38
+ - Each provider fails independently (NO fallback chains during testing)
39
+
40
+ **Modified Files:**
41
+ - **src/tools/vision.py** (~120 lines added/modified)
42
+ - Added `analyze_image_hf()` function with retry logic
43
+ - Updated `analyze_image()` routing with provider selection
44
+ - Added HF_VISION_MODEL and HF_TIMEOUT config
45
+ - **.env.example** (~4 lines added)
46
+ - Documented HF_TOKEN and HF_VISION_MODEL settings
47
+
48
+ **Validated Models (Phase 0 Extended Testing):**
49
+
50
+ | Rank | Model | Provider | Speed | Notes |
51
+ |------|-------|----------|-------|-------|
52
+ | 1 | `google/gemma-3-27b-it` | Scaleway | ~6s | **RECOMMENDED** - Google brand |
53
+ | 2 | `CohereLabs/aya-vision-32b` | Cohere | ~7s | Fast, less known brand |
54
+ | 3 | `Qwen/Qwen3-VL-30B-A3B-Instruct` | Novita | ~14s | Qwen brand, reputable |
55
+ | 4 | `zai-org/GLM-4.6V-Flash` | zai-org | ~16s | Zhipu AI brand |
56
+
57
+ **Failed Models (not vision-capable):**
58
+ - `zai-org/GLM-4.7:cerebras` - Text-only (422 error: "Content type 'image_url' not supported")
59
+ - `openai/gpt-oss-120b:novita` - Text-only (400 Bad request)
60
+ - `openai/gpt-oss-120b:groq` - Text-only (400: "content must be a string")
61
+ - `moonshotai/Kimi-K2-Instruct-0905:novita` - 400 Bad request
62
+
63
+ **Next Steps:** Smoke tests (Phase 2) to validate integration
64
+
65
+ ---
66
+
67
+ ## [2026-01-11] [Phase 0 Extended] [COMPLETED] Additional Vision Models Tested - Google Gemma 3 Selected
68
+
69
+ **Problem:** Needed to find more reputable vision models (aya-vision-32b brand unknown to user).
70
+
71
+ **Solution:** Tested user-requested models with provider routing.
72
+
73
+ **Test Results:**
74
+
75
+ **Working Models:**
76
+ - `google/gemma-3-27b-it:scaleway` ✅ - ~6s, Google brand, **RECOMMENDED**
77
+ - `zai-org/GLM-4.6V-Flash:zai-org` ✅ - ~16s, Zhipu AI brand
78
+ - `Qwen/Qwen3-VL-30B-A3B-Instruct:novita` ✅ - ~14s, Qwen brand
79
+
80
+ **Failed Models:**
81
+ - `zai-org/GLM-4.7:cerebras` ❌ - Text-only model (422: "image_url not supported")
82
+ - `openai/gpt-oss-120b:novita` ❌ - Generic 400 Bad request
83
+ - `openai/gpt-oss-120b:groq` ❌ - Text-only (400: "content must be a string")
84
+ - `moonshotai/Kimi-K2-Instruct-0905:novita` ❌ - Generic 400 Bad request
85
+
86
+ **Output Files:**
87
+ - `output/phase0_vision_validation_20260111_162124.json` - 4 new models test
88
+ - `output/phase0_vision_validation_20260111_163647.json` - Groq provider test
89
+ - `output/phase0_vision_validation_20260111_164531.json` - GLM-4.6V test
90
+ - `output/phase0_vision_validation_20260111_164945.json` - Gemma-3-27B test
91
+
92
+ **Decision:** Use `google/gemma-3-27b-it:scaleway` for production (fastest, most reputable brand)
93
+
94
+ ---
95
+
96
  ## [2026-01-07] [Phase 0: API Validation] [COMPLETED] HF Inference Vision Support - GO Decision
97
 
98
  **Problem:** Needed to validate HF Inference API supports vision models before implementation.
PLAN.md CHANGED
@@ -172,7 +172,19 @@ Fix LLM selection routing so UI provider selection propagates to ALL tools (plan
172
  - **Option D:** Local transformers library (no API)
173
  - **Option E:** Hybrid (HF text + Gemini/Claude vision only)
174
 
175
- **Phase 0 Status:** ✅ COMPLETED - See CHANGELOG.md for results
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
  ---
178
 
@@ -182,14 +194,14 @@ Fix LLM selection routing so UI provider selection propagates to ALL tools (plan
182
 
183
  **Validated from Phase 0:**
184
 
185
- - Model: `CohereLabs/aya-vision-32b` (Cohere provider)
186
  - Format: Base64 encoding in messages array
187
- - Timeout: 120+ seconds for large images
188
 
189
  #### Step 1.1: Implement `analyze_image_hf()` in vision.py
190
 
191
  - [ ] Add function signature matching existing pattern
192
- - [ ] Use **CohereLabs/aya-vision-32b** (validated from Phase 0)
193
  - [ ] Format: Base64 encode images in messages array
194
  - [ ] Add retry logic with exponential backoff (3 attempts)
195
  - [ ] Handle API errors with clear error messages
 
172
  - **Option D:** Local transformers library (no API)
173
  - **Option E:** Hybrid (HF text + Gemini/Claude vision only)
174
 
175
+ **Phase 0 Status:** ✅ COMPLETED - Multiple working models found
176
+
177
+ **Validated Models (Ranked by Speed):**
178
+
179
+ | Rank | Model | Provider | Speed | Notes |
180
+ |------|-------|----------|-------|-------|
181
+ | 1 | `google/gemma-3-27b-it` | Scaleway | ~6s | **RECOMMENDED** - Google brand, fastest |
182
+ | 2 | `CohereLabs/aya-vision-32b` | Cohere | ~7s | Fast, less known brand |
183
+ | 3 | `Qwen/Qwen3-VL-30B-A3B-Instruct` | Novita | ~14s | Qwen brand, reputable |
184
+ | 4 | `zai-org/GLM-4.6V-Flash` | zai-org | ~16s | Zhipu AI brand |
185
+
186
+ **Format:** Base64 encoding only (file:// URLs don't work)
187
+ **Test image:** 2.1MB workspace photo (realistic large image)
188
 
189
  ---
190
 
 
194
 
195
  **Validated from Phase 0:**
196
 
197
+ - Model: `google/gemma-3-27b-it:scaleway` (RECOMMENDED - fastest, Google brand)
198
  - Format: Base64 encoding in messages array
199
+ - Timeout: ~6 seconds for 2.1MB image
200
 
201
  #### Step 1.1: Implement `analyze_image_hf()` in vision.py
202
 
203
  - [ ] Add function signature matching existing pattern
204
+ - [ ] Use **google/gemma-3-27b-it:scaleway** (validated, fastest)
205
  - [ ] Format: Base64 encode images in messages array
206
  - [ ] Add retry logic with exponential backoff (3 attempts)
207
  - [ ] Handle API errors with clear error messages
src/config/settings.py CHANGED
@@ -21,6 +21,8 @@ load_dotenv()
21
  # LLM Configuration (Level 5 - Component Selection)
22
  ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")
23
  GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY", "")
 
 
24
  DEFAULT_LLM_MODEL: Literal["gemini", "claude"] = os.getenv("DEFAULT_LLM_MODEL", "gemini") # type: ignore
25
 
26
  # Tool API Keys (Level 5 - Component Selection)
@@ -53,6 +55,8 @@ class Settings:
53
  def __init__(self):
54
  self.anthropic_api_key = ANTHROPIC_API_KEY
55
  self.google_api_key = GOOGLE_API_KEY
 
 
56
  self.default_llm_model = DEFAULT_LLM_MODEL
57
 
58
  self.exa_api_key = EXA_API_KEY
@@ -77,6 +81,7 @@ class Settings:
77
  return {
78
  "anthropic": bool(self.anthropic_api_key),
79
  "google": bool(self.google_api_key),
 
80
  "exa": bool(self.exa_api_key),
81
  "tavily": bool(self.tavily_api_key),
82
  }
 
21
  # LLM Configuration (Level 5 - Component Selection)
22
  ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")
23
  GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY", "")
24
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
25
+ HF_VISION_MODEL = os.getenv("HF_VISION_MODEL", "google/gemma-3-27b-it:scaleway")
26
  DEFAULT_LLM_MODEL: Literal["gemini", "claude"] = os.getenv("DEFAULT_LLM_MODEL", "gemini") # type: ignore
27
 
28
  # Tool API Keys (Level 5 - Component Selection)
 
55
  def __init__(self):
56
  self.anthropic_api_key = ANTHROPIC_API_KEY
57
  self.google_api_key = GOOGLE_API_KEY
58
+ self.hf_token = HF_TOKEN
59
+ self.hf_vision_model = HF_VISION_MODEL
60
  self.default_llm_model = DEFAULT_LLM_MODEL
61
 
62
  self.exa_api_key = EXA_API_KEY
 
81
  return {
82
  "anthropic": bool(self.anthropic_api_key),
83
  "google": bool(self.google_api_key),
84
+ "huggingface": bool(self.hf_token),
85
  "exa": bool(self.exa_api_key),
86
  "tavily": bool(self.tavily_api_key),
87
  }
src/tools/vision.py CHANGED
@@ -4,8 +4,9 @@ Author: @mangobee
4
  Date: 2026-01-02
5
 
6
  Provides image analysis functionality using:
7
- - Gemini 2.0 Flash (default, free tier)
8
- - Claude Sonnet 4.5 (fallback, if configured)
 
9
 
10
  Supports:
11
  - Image file loading and encoding
@@ -15,6 +16,7 @@ Supports:
15
  - Visual reasoning
16
  """
17
 
 
18
  import base64
19
  import logging
20
  from pathlib import Path
@@ -36,6 +38,8 @@ RETRY_MIN_WAIT = 1 # seconds
36
  RETRY_MAX_WAIT = 10 # seconds
37
  MAX_IMAGE_SIZE_MB = 10 # Maximum image size in MB
38
  SUPPORTED_IMAGE_FORMATS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp'}
 
 
39
 
40
  # ============================================================================
41
  # Logging Setup
@@ -296,44 +300,165 @@ def analyze_image_claude(image_path: str, question: Optional[str] = None) -> Dic
296
  raise Exception(f"Claude vision failed: {str(e)}")
297
 
298
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
299
  # ============================================================================
300
  # Unified Vision Analysis
301
  # ============================================================================
302
 
303
  def analyze_image(image_path: str, question: Optional[str] = None) -> Dict:
304
  """
305
- Analyze image using available multimodal LLM.
306
 
307
- Tries Gemini first (free tier), falls back to Claude if configured.
 
 
 
 
308
 
309
  Args:
310
  image_path: Path to image file
311
  question: Optional question about the image
312
 
313
  Returns:
314
- Dict with analysis results from either Gemini or Claude
315
 
316
  Raises:
317
- Exception: If both Gemini and Claude fail or are not configured
318
  """
 
319
  settings = Settings()
320
 
321
- # Try Gemini first (default, free tier)
322
- if settings.google_api_key:
 
 
 
 
 
 
 
 
 
 
 
323
  try:
324
  return analyze_image_gemini(image_path, question)
325
  except Exception as e:
326
- logger.warning(f"Gemini failed, trying Claude: {e}")
 
327
 
328
- # Fallback to Claude
329
- if settings.anthropic_api_key:
 
330
  try:
331
  return analyze_image_claude(image_path, question)
332
  except Exception as e:
333
- logger.error(f"Claude also failed: {e}")
334
- raise Exception(f"Vision analysis failed - Gemini and Claude both failed")
 
 
 
335
 
336
- # No API keys configured
337
- raise ValueError(
338
- "No vision API configured. Please set GOOGLE_API_KEY or ANTHROPIC_API_KEY"
339
- )
 
4
  Date: 2026-01-02
5
 
6
  Provides image analysis functionality using:
7
+ - HuggingFace Inference API (Gemini-3-27B, recommended)
8
+ - Gemini 2.0 Flash (fallback)
9
+ - Claude Sonnet 4.5 (fallback)
10
 
11
  Supports:
12
  - Image file loading and encoding
 
16
  - Visual reasoning
17
  """
18
 
19
+ import os
20
  import base64
21
  import logging
22
  from pathlib import Path
 
38
  RETRY_MAX_WAIT = 10 # seconds
39
  MAX_IMAGE_SIZE_MB = 10 # Maximum image size in MB
40
  SUPPORTED_IMAGE_FORMATS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp'}
41
+ HF_VISION_MODEL = os.getenv("HF_VISION_MODEL", "google/gemma-3-27b-it:scaleway")
42
+ HF_TIMEOUT = 120 # seconds for large images
43
 
44
  # ============================================================================
45
  # Logging Setup
 
300
  raise Exception(f"Claude vision failed: {str(e)}")
301
 
302
 
303
+ # ============================================================================
304
+ # HuggingFace Vision
305
+ # ============================================================================
306
+
307
+ @retry(
308
+ stop=stop_after_attempt(MAX_RETRIES),
309
+ wait=wait_exponential(multiplier=1, min=RETRY_MIN_WAIT, max=RETRY_MAX_WAIT),
310
+ retry=retry_if_exception_type((ConnectionError, TimeoutError)),
311
+ reraise=True,
312
+ )
313
+ def analyze_image_hf(image_path: str, question: Optional[str] = None) -> Dict:
314
+ """
315
+ Analyze image using HuggingFace Inference API.
316
+
317
+ Validated models (Phase 0 testing):
318
+ - google/gemma-3-27b-it:scaleway (recommended, ~6s)
319
+ - CohereLabs/aya-vision-32b (~7s)
320
+ - Qwen/Qwen3-VL-30B-A3B-Instruct:novita (~14s)
321
+
322
+ Args:
323
+ image_path: Path to image file
324
+ question: Optional question about the image (default: "Describe this image")
325
+
326
+ Returns:
327
+ Dict with structure: {
328
+ "answer": str,
329
+ "model": str,
330
+ "image_path": str,
331
+ "question": str
332
+ }
333
+
334
+ Raises:
335
+ ValueError: If HF_TOKEN not configured or image invalid
336
+ ConnectionError: If API connection fails (triggers retry)
337
+ """
338
+ try:
339
+ from huggingface_hub import InferenceClient
340
+
341
+ settings = Settings()
342
+ hf_token = settings.hf_token
343
+
344
+ if not hf_token:
345
+ raise ValueError("HF_TOKEN not configured in settings")
346
+
347
+ # Load and encode image
348
+ image_data = load_and_encode_image(image_path)
349
+
350
+ # Default question
351
+ if not question:
352
+ question = "Describe this image in detail."
353
+
354
+ logger.info(f"HF vision analysis: {Path(image_path).name} - '{question}'")
355
+ logger.info(f"Using model: {HF_VISION_MODEL}")
356
+
357
+ # Configure HF client
358
+ client = InferenceClient(token=hf_token)
359
+
360
+ # Create messages with base64 image
361
+ messages = [
362
+ {
363
+ "role": "user",
364
+ "content": [
365
+ {"type": "text", "text": question},
366
+ {
367
+ "type": "image_url",
368
+ "image_url": {
369
+ "url": f"data:{image_data['mime_type']};base64,{image_data['data']}"
370
+ }
371
+ }
372
+ ]
373
+ }
374
+ ]
375
+
376
+ # Call chat completion
377
+ response = client.chat_completion(
378
+ model=HF_VISION_MODEL,
379
+ messages=messages,
380
+ max_tokens=1024,
381
+ )
382
+
383
+ answer = response.choices[0].message.content.strip()
384
+
385
+ logger.info(f"HF vision successful: {len(answer)} chars")
386
+
387
+ return {
388
+ "answer": answer,
389
+ "model": HF_VISION_MODEL,
390
+ "image_path": image_path,
391
+ "question": question,
392
+ }
393
+
394
+ except ValueError as e:
395
+ logger.error(f"HF configuration/input error: {e}")
396
+ raise
397
+ except (ConnectionError, TimeoutError) as e:
398
+ logger.warning(f"HF connection error (will retry): {e}")
399
+ raise
400
+ except Exception as e:
401
+ logger.error(f"HF vision error: {e}")
402
+ raise Exception(f"HF vision failed: {str(e)}")
403
+
404
+
405
  # ============================================================================
406
  # Unified Vision Analysis
407
  # ============================================================================
408
 
409
  def analyze_image(image_path: str, question: Optional[str] = None) -> Dict:
410
  """
411
+ Analyze image using provider specified by LLM_PROVIDER environment variable.
412
 
413
+ Respects LLM_PROVIDER setting:
414
+ - "huggingface" -> Uses HF Inference API
415
+ - "gemini" -> Uses Gemini 2.0 Flash
416
+ - "claude" -> Uses Claude Sonnet 4.5
417
+ - "groq" -> Not yet implemented
418
 
419
  Args:
420
  image_path: Path to image file
421
  question: Optional question about the image
422
 
423
  Returns:
424
+ Dict with analysis results from selected provider
425
 
426
  Raises:
427
+ Exception: If selected provider fails or is not configured
428
  """
429
+ provider = os.getenv("LLM_PROVIDER", "gemini").lower()
430
  settings = Settings()
431
 
432
+ logger.info(f"Vision analysis with provider: {provider}")
433
+
434
+ # Route to selected provider (each fails independently - NO fallback chains)
435
+ if provider == "huggingface":
436
+ try:
437
+ return analyze_image_hf(image_path, question)
438
+ except Exception as e:
439
+ logger.error(f"HF vision failed: {e}")
440
+ raise Exception(f"HF vision failed: {str(e)}")
441
+
442
+ elif provider == "gemini":
443
+ if not settings.google_api_key:
444
+ raise ValueError("GOOGLE_API_KEY not configured for Gemini provider")
445
  try:
446
  return analyze_image_gemini(image_path, question)
447
  except Exception as e:
448
+ logger.error(f"Gemini vision failed: {e}")
449
+ raise Exception(f"Gemini vision failed: {str(e)}")
450
 
451
+ elif provider == "claude":
452
+ if not settings.anthropic_api_key:
453
+ raise ValueError("ANTHROPIC_API_KEY not configured for Claude provider")
454
  try:
455
  return analyze_image_claude(image_path, question)
456
  except Exception as e:
457
+ logger.error(f"Claude vision failed: {e}")
458
+ raise Exception(f"Claude vision failed: {str(e)}")
459
+
460
+ elif provider == "groq":
461
+ raise NotImplementedError("Groq vision not yet implemented (Phase 5)")
462
 
463
+ else:
464
+ raise ValueError(f"Unknown LLM_PROVIDER: {provider}. Valid: huggingface, gemini, claude, groq")
 
 
test/test_smoke_hf_vision.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Phase 2: Smoke Tests for HF Vision Integration
4
+ Author: @mangobee
5
+ Date: 2026-01-11
6
+
7
+ Quick validation that HF vision works before GAIA evaluation.
8
+ """
9
+
10
+ import os
11
+ import sys
12
+ import logging
13
+ from pathlib import Path
14
+
15
+ # Add project root to path
16
+ sys.path.insert(0, str(Path(__file__).parent.parent))
17
+
18
+ from dotenv import load_dotenv
19
+ load_dotenv()
20
+
21
+ # Set HF provider for testing
22
+ os.environ["LLM_PROVIDER"] = "huggingface"
23
+
24
+ from src.tools.vision import analyze_image
25
+
26
+ # ============================================================================
27
+ # CONFIG
28
+ # ============================================================================
29
+ TEST_IMAGE = "test/fixtures/test_image_red_square.jpg"
30
+ logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
31
+ logger = logging.getLogger(__name__)
32
+
33
+ # ============================================================================
34
+ # Smoke Test
35
+ # ============================================================================
36
+
37
+ def run_smoke_test():
38
+ """Run single smoke test: simple image description."""
39
+ logger.info("=" * 60)
40
+ logger.info("PHASE 2: SMOKE TEST - HF Vision Integration")
41
+ logger.info("=" * 60)
42
+ logger.info(f"Test image: {TEST_IMAGE}")
43
+ logger.info(f"Provider: {os.getenv('LLM_PROVIDER')}")
44
+ logger.info(f"Model: {os.getenv('HF_VISION_MODEL', 'google/gemma-3-27b-it:scaleway')}")
45
+ logger.info("=" * 60)
46
+
47
+ try:
48
+ result = analyze_image(TEST_IMAGE, "What is in this image?")
49
+
50
+ logger.info("\n✅ SMOKE TEST PASSED")
51
+ logger.info("-" * 60)
52
+ logger.info(f"Model used: {result['model']}")
53
+ logger.info(f"Answer: {result['answer'][:200]}...")
54
+ logger.info("-" * 60)
55
+ return True
56
+
57
+ except Exception as e:
58
+ logger.error(f"\n❌ SMOKE TEST FAILED: {e}")
59
+ return False
60
+
61
+ if __name__ == "__main__":
62
+ success = run_smoke_test()
63
+ sys.exit(0 if success else 1)