kingfroglao commited on
Commit
9b3e244
·
1 Parent(s): 844a59d

add readme

Browse files
Files changed (2) hide show
  1. README.md +65 -0
  2. app.py +2 -2
README.md CHANGED
@@ -11,3 +11,68 @@ license: mit
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # 🐾 Animal Species Similarity Comparison
16
+
17
+ This Hugging Face Space allows users to upload two animal images and determine whether they are the same species. It combines object detection, classification, and visual embedding techniques using state-of-the-art models.
18
+
19
+ ---
20
+
21
+ ## How It Works
22
+ This app performs the following tasks:
23
+
24
+ 1. **Object Detection** using [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) to locate animals and draw bounding boxes.
25
+ 2. **Image Classification** using [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) to predict the species label of each animal.
26
+ 3. **Visual Similarity Calculation** using:
27
+ - **ViT embeddings** for global semantic similarity
28
+ - **ResNet-50 embeddings** for local feature comparison
29
+ - **Label match indicator** based on top-1 classification
30
+
31
+ A weighted fusion of all similarity scores is computed and interpreted to output a final decision.
32
+
33
+ ---
34
+
35
+ ## Example Use
36
+ Upload two images of animals—cats, zebras, dogs, or wild animals—and get a prediction like:
37
+
38
+ ```
39
+ ViT Similarity: 0.742
40
+ ResNet Similarity: 0.815
41
+ Label Match: 1.0
42
+ Final Score: 0.762 → 🟡 Possibly same species
43
+ ```
44
+
45
+ ---
46
+
47
+ ## References
48
+ - Carion, N., et al. (2020). ["End-to-End Object Detection with Transformers (DETR)"](https://arxiv.org/abs/2005.12872).
49
+ - Dosovitskiy, A., et al. (2021). ["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)"](https://arxiv.org/abs/2010.11929).
50
+ - Hugging Face Transformers: https://huggingface.co/docs/transformers
51
+ - timm: PyTorch Image Models by Ross Wightman – https://github.com/huggingface/pytorch-image-models
52
+
53
+ ---
54
+
55
+ ## 🤖 Acknowledgment
56
+ This project was built with the assistance of **ChatGPT** (OpenAI) to support code generation, explanation, formatting, and technical writing.
57
+
58
+ ---
59
+
60
+ ## Built With
61
+ - Hugging Face Spaces + Transformers
62
+ - PyTorch + TorchVision
63
+ - Gradio UI
64
+ - ViT + DETR + ResNet50
65
+
66
+ ---
67
+
68
+ ## Requirements
69
+ ```
70
+ transformers>=4.36.0
71
+ torch
72
+ torchvision
73
+ timm
74
+ gradio
75
+ Pillow
76
+ ```
77
+
78
+ ---
app.py CHANGED
@@ -12,8 +12,8 @@ def process(img1, img2):
12
  boxed_img2 = draw_detr_boxes(img2.copy())
13
 
14
  final_text = f"""
15
- 🌍 ViT Similarity: {result['vit_score']:.3f}
16
- 🔬 ResNet Similarity: {result['resnet_score']:.3f}
17
  📊 Label Match: {result['label_match']:.1f}
18
 
19
  ⭐ Final Score: {result['final_score']:.3f}
 
12
  boxed_img2 = draw_detr_boxes(img2.copy())
13
 
14
  final_text = f"""
15
+ 📊 ViT Similarity: {result['vit_score']:.3f}
16
+ 📊 ResNet Similarity: {result['resnet_score']:.3f}
17
  📊 Label Match: {result['label_match']:.1f}
18
 
19
  ⭐ Final Score: {result['final_score']:.3f}