Add pipeline tag and improve model card

Hi! I'm Niels from the community science team at Hugging Face. I've opened this PR to improve the model card for your data curation models.

This update adds structured metadata, including the `image-classification` pipeline tag and domain-specific tags (`medical`, `surgical`, `endoscopy`). This will help researchers find these artifacts more easily on the Hugging Face Hub. I have also cleaned up the Markdown structure to make the documentation clearer while preserving all existing images, links, and usage examples.

Files changed (1) hide show

README.md +23 -17

README.md CHANGED Viewed

@@ -1,26 +1,30 @@
 ---
 license: apache-2.0
 ---
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/cE7UgFfJJ2gUHJr0SSEhc.png"> </img>
 </div>
 [📚 Paper](https://arxiv.org/abs/2503.19740) - [🤖 GitHub](https://github.com/visurg-ai/LEMON)
-We provide the models used in our data curation pipeline in [📚 LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings](https://arxiv.org/abs/2503.19740) to assist with constructing the LEMON dataset (for more details about the LEMON dataset and our
-LemonFM foundation model, please visit our github repository at [🤖 GitHub](https://github.com/visurg-ai/LEMON)) .
 If you use our dataset, model, or code in your research, please cite our paper:
-```
 @misc{che2025lemonlargeendoscopicmonocular,
       title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings},
-      author={Chengan Che and Chao Wang and Tom Vercauteren and Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
       year={2025},
       eprint={2503.19740},
       archivePrefix={arXiv},
@@ -29,10 +33,9 @@ If you use our dataset, model, or code in your research, please cite our paper:
 }
 ```
-This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py)
 <div align="center">
 <table style="margin-left: auto; margin-right: auto;">
@@ -59,15 +62,16 @@ This Hugging Face repository includes video storyboard classification models, fr
 </table>
 </div>
 The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/jzw36jlPT-V_I-Vm01OzO.png"> </img>
 </div>
-Usage
---------
-**Video classification models** are employed in the step **2** of the data curation pipeline to classify a video storyboard as either surgical or non-surgical, the models usage is as follows:
    ```python
    import torch
    import torchvision
@@ -102,7 +106,8 @@ Usage
    outputs = net(img_tensor)
    ```
-**Frame classification models** are used in the step **3** of the data curation pipeline to classify a frame as either surgical or non-surgical, the models usage is as follows:
    ```python
    import torch
@@ -137,7 +142,8 @@ Usage
    outputs = net(img_tensor)
    ```
-**Non-surgical object detection models** are used to obliterate the non-surgical region in the surgical frames (e.g. user interface information), the models usage is as follows:
    ```python
    import torch
@@ -170,4 +176,4 @@ Usage
    # Extract features from the image
    outputs = net(img_tensor)
-   ```

 ---
 license: apache-2.0
+pipeline_tag: image-classification
+tags:
+- medical
+- surgical
+- endoscopy
 ---
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/cE7UgFfJJ2gUHJr0SSEhc.png"> </img>
 </div>
 [📚 Paper](https://arxiv.org/abs/2503.19740) - [🤖 GitHub](https://github.com/visurg-ai/LEMON)
+This repository provides the models used in the data curation pipeline for the paper [LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings](https://arxiv.org/abs/2503.19740). These models assist in constructing the LEMON dataset by filtering and processing surgical video content.
+For more details about the LEMON dataset and our LemonFM foundation model, please visit our [GitHub repository](https://github.com/visurg-ai/LEMON).
+## Citation
 If you use our dataset, model, or code in your research, please cite our paper:
+```bibtex
 @misc{che2025lemonlargeendoscopicmonocular,
       title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings},
+      author={Chengan Che and Chao Wang and Tom Vercauteren messenger, Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
       year={2025},
       eprint={2503.19740},
       archivePrefix={arXiv},
 }
 ```
+## Model Overview
+This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py).
 <div align="center">
 <table style="margin-left: auto; margin-right: auto;">
 </table>
 </div>
 The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/jzw36jlPT-V_I-Vm01OzO.png"> </img>
 </div>
+## Usage
+### Video classification models
+**Video classification models** are employed in step **2** of the data curation pipeline to classify a video storyboard as either surgical or non-surgical:
    ```python
    import torch
    import torchvision
    outputs = net(img_tensor)
    ```
+### Frame classification models
+**Frame classification models** are used in step **3** of the data curation pipeline to classify a frame as either surgical or non-surgical:
    ```python
    import torch
    outputs = net(img_tensor)
    ```
+### Non-surgical object detection models
+**Non-surgical object detection models** are used to obliterate the non-surgical region in the surgical frames (e.g. user interface information):
    ```python
    import torch
    # Extract features from the image
    outputs = net(img_tensor)
+   ```