Spaces:
				
			
			
	
			
			
		Sleeping
		
	
	
	
			
			
	
	
	
	
		
		
		Sleeping
		
	
		kingfroglao
		
	commited on
		
		
					Commit 
							
							·
						
						9b3e244
	
1
								Parent(s):
							
							844a59d
								
add readme
Browse files
    	
        README.md
    CHANGED
    
    | @@ -11,3 +11,68 @@ license: mit | |
| 11 | 
             
            ---
         | 
| 12 |  | 
| 13 | 
             
            Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 11 | 
             
            ---
         | 
| 12 |  | 
| 13 | 
             
            Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            # 🐾 Animal Species Similarity Comparison
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            This Hugging Face Space allows users to upload two animal images and determine whether they are the same species. It combines object detection, classification, and visual embedding techniques using state-of-the-art models.
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            ---
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ## How It Works
         | 
| 22 | 
            +
            This app performs the following tasks:
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            1. **Object Detection** using [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) to locate animals and draw bounding boxes.
         | 
| 25 | 
            +
            2. **Image Classification** using [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) to predict the species label of each animal.
         | 
| 26 | 
            +
            3. **Visual Similarity Calculation** using:
         | 
| 27 | 
            +
               - **ViT embeddings** for global semantic similarity
         | 
| 28 | 
            +
               - **ResNet-50 embeddings** for local feature comparison
         | 
| 29 | 
            +
               - **Label match indicator** based on top-1 classification
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            A weighted fusion of all similarity scores is computed and interpreted to output a final decision.
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            ---
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ## Example Use
         | 
| 36 | 
            +
            Upload two images of animals—cats, zebras, dogs, or wild animals—and get a prediction like:
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            ```
         | 
| 39 | 
            +
            ViT Similarity: 0.742
         | 
| 40 | 
            +
            ResNet Similarity: 0.815
         | 
| 41 | 
            +
            Label Match: 1.0
         | 
| 42 | 
            +
            Final Score: 0.762 → 🟡 Possibly same species
         | 
| 43 | 
            +
            ```
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            ---
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            ## References
         | 
| 48 | 
            +
            - Carion, N., et al. (2020). ["End-to-End Object Detection with Transformers (DETR)"](https://arxiv.org/abs/2005.12872).
         | 
| 49 | 
            +
            - Dosovitskiy, A., et al. (2021). ["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)"](https://arxiv.org/abs/2010.11929).
         | 
| 50 | 
            +
            - Hugging Face Transformers: https://huggingface.co/docs/transformers
         | 
| 51 | 
            +
            - timm: PyTorch Image Models by Ross Wightman – https://github.com/huggingface/pytorch-image-models
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            ---
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            ## 🤖 Acknowledgment
         | 
| 56 | 
            +
            This project was built with the assistance of **ChatGPT** (OpenAI) to support code generation, explanation, formatting, and technical writing.
         | 
| 57 | 
            +
             | 
| 58 | 
            +
            ---
         | 
| 59 | 
            +
             | 
| 60 | 
            +
            ## Built With
         | 
| 61 | 
            +
            - Hugging Face Spaces + Transformers
         | 
| 62 | 
            +
            - PyTorch + TorchVision
         | 
| 63 | 
            +
            - Gradio UI
         | 
| 64 | 
            +
            - ViT + DETR + ResNet50
         | 
| 65 | 
            +
             | 
| 66 | 
            +
            ---
         | 
| 67 | 
            +
             | 
| 68 | 
            +
            ## Requirements
         | 
| 69 | 
            +
            ```
         | 
| 70 | 
            +
            transformers>=4.36.0
         | 
| 71 | 
            +
            torch
         | 
| 72 | 
            +
            torchvision
         | 
| 73 | 
            +
            timm
         | 
| 74 | 
            +
            gradio
         | 
| 75 | 
            +
            Pillow
         | 
| 76 | 
            +
            ```
         | 
| 77 | 
            +
             | 
| 78 | 
            +
            ---
         | 
    	
        app.py
    CHANGED
    
    | @@ -12,8 +12,8 @@ def process(img1, img2): | |
| 12 | 
             
                boxed_img2 = draw_detr_boxes(img2.copy())
         | 
| 13 |  | 
| 14 | 
             
                final_text = f"""
         | 
| 15 | 
            -
                 | 
| 16 | 
            -
                 | 
| 17 | 
             
                📊 Label Match: {result['label_match']:.1f}
         | 
| 18 |  | 
| 19 | 
             
                ⭐ Final Score: {result['final_score']:.3f}
         | 
|  | |
| 12 | 
             
                boxed_img2 = draw_detr_boxes(img2.copy())
         | 
| 13 |  | 
| 14 | 
             
                final_text = f"""
         | 
| 15 | 
            +
                📊 ViT Similarity: {result['vit_score']:.3f}
         | 
| 16 | 
            +
                📊 ResNet Similarity: {result['resnet_score']:.3f}
         | 
| 17 | 
             
                📊 Label Match: {result['label_match']:.1f}
         | 
| 18 |  | 
| 19 | 
             
                ⭐ Final Score: {result['final_score']:.3f}
         | 
