Add pipeline tag and improve model card
Browse filesHi! I'm Niels from the community science team at Hugging Face. I've opened this PR to improve the model card for your data curation models.
This update adds structured metadata, including the `image-classification` pipeline tag and domain-specific tags (`medical`, `surgical`, `endoscopy`). This will help researchers find these artifacts more easily on the Hugging Face Hub. I have also cleaned up the Markdown structure to make the documentation clearer while preserving all existing images, links, and usage examples.
README.md
CHANGED
|
@@ -1,26 +1,30 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
<div align="center">
|
| 6 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/cE7UgFfJJ2gUHJr0SSEhc.png"> </img>
|
| 7 |
</div>
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
[📚 Paper](https://arxiv.org/abs/2503.19740) - [🤖 GitHub](https://github.com/visurg-ai/LEMON)
|
| 13 |
|
| 14 |
-
|
| 15 |
-
LemonFM foundation model, please visit our github repository at [🤖 GitHub](https://github.com/visurg-ai/LEMON)) .
|
| 16 |
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
If you use our dataset, model, or code in your research, please cite our paper:
|
| 19 |
|
| 20 |
-
```
|
| 21 |
@misc{che2025lemonlargeendoscopicmonocular,
|
| 22 |
title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings},
|
| 23 |
-
author={Chengan Che and Chao Wang and Tom Vercauteren
|
| 24 |
year={2025},
|
| 25 |
eprint={2503.19740},
|
| 26 |
archivePrefix={arXiv},
|
|
@@ -29,10 +33,9 @@ If you use our dataset, model, or code in your research, please cite our paper:
|
|
| 29 |
}
|
| 30 |
```
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py)
|
| 35 |
-
|
| 36 |
|
| 37 |
<div align="center">
|
| 38 |
<table style="margin-left: auto; margin-right: auto;">
|
|
@@ -59,15 +62,16 @@ This Hugging Face repository includes video storyboard classification models, fr
|
|
| 59 |
</table>
|
| 60 |
</div>
|
| 61 |
|
| 62 |
-
|
| 63 |
The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:
|
| 64 |
<div align="center">
|
| 65 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/jzw36jlPT-V_I-Vm01OzO.png"> </img>
|
| 66 |
</div>
|
| 67 |
|
| 68 |
-
Usage
|
| 69 |
-
|
| 70 |
-
|
|
|
|
|
|
|
| 71 |
```python
|
| 72 |
import torch
|
| 73 |
import torchvision
|
|
@@ -102,7 +106,8 @@ Usage
|
|
| 102 |
outputs = net(img_tensor)
|
| 103 |
```
|
| 104 |
|
| 105 |
-
|
|
|
|
| 106 |
|
| 107 |
```python
|
| 108 |
import torch
|
|
@@ -137,7 +142,8 @@ Usage
|
|
| 137 |
outputs = net(img_tensor)
|
| 138 |
```
|
| 139 |
|
| 140 |
-
|
|
|
|
| 141 |
|
| 142 |
```python
|
| 143 |
import torch
|
|
@@ -170,4 +176,4 @@ Usage
|
|
| 170 |
|
| 171 |
# Extract features from the image
|
| 172 |
outputs = net(img_tensor)
|
| 173 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-classification
|
| 4 |
+
tags:
|
| 5 |
+
- medical
|
| 6 |
+
- surgical
|
| 7 |
+
- endoscopy
|
| 8 |
---
|
| 9 |
|
| 10 |
<div align="center">
|
| 11 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/cE7UgFfJJ2gUHJr0SSEhc.png"> </img>
|
| 12 |
</div>
|
| 13 |
|
|
|
|
|
|
|
|
|
|
| 14 |
[📚 Paper](https://arxiv.org/abs/2503.19740) - [🤖 GitHub](https://github.com/visurg-ai/LEMON)
|
| 15 |
|
| 16 |
+
This repository provides the models used in the data curation pipeline for the paper [LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings](https://arxiv.org/abs/2503.19740). These models assist in constructing the LEMON dataset by filtering and processing surgical video content.
|
|
|
|
| 17 |
|
| 18 |
+
For more details about the LEMON dataset and our LemonFM foundation model, please visit our [GitHub repository](https://github.com/visurg-ai/LEMON).
|
| 19 |
+
|
| 20 |
+
## Citation
|
| 21 |
|
| 22 |
If you use our dataset, model, or code in your research, please cite our paper:
|
| 23 |
|
| 24 |
+
```bibtex
|
| 25 |
@misc{che2025lemonlargeendoscopicmonocular,
|
| 26 |
title={LEMON: A Large Endoscopic MONocular Dataset and Foundation Model for Perception in Surgical Settings},
|
| 27 |
+
author={Chengan Che and Chao Wang and Tom Vercauteren messenger, Sophia Tsoka and Luis C. Garcia-Peraza-Herrera},
|
| 28 |
year={2025},
|
| 29 |
eprint={2503.19740},
|
| 30 |
archivePrefix={arXiv},
|
|
|
|
| 33 |
}
|
| 34 |
```
|
| 35 |
|
| 36 |
+
## Model Overview
|
| 37 |
|
| 38 |
+
This Hugging Face repository includes video storyboard classification models, frame classification models, and non-surgical object detection models. The model loader file can be found at [model_loader.py](https://huggingface.co/visurg/Surg3M_curation_models/blob/main/model_loader.py).
|
|
|
|
|
|
|
| 39 |
|
| 40 |
<div align="center">
|
| 41 |
<table style="margin-left: auto; margin-right: auto;">
|
|
|
|
| 62 |
</table>
|
| 63 |
</div>
|
| 64 |
|
|
|
|
| 65 |
The data curation pipeline leading to the clean videos in the LEMON dataset is as follows:
|
| 66 |
<div align="center">
|
| 67 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67d9504a41d31cc626fcecc8/jzw36jlPT-V_I-Vm01OzO.png"> </img>
|
| 68 |
</div>
|
| 69 |
|
| 70 |
+
## Usage
|
| 71 |
+
|
| 72 |
+
### Video classification models
|
| 73 |
+
**Video classification models** are employed in step **2** of the data curation pipeline to classify a video storyboard as either surgical or non-surgical:
|
| 74 |
+
|
| 75 |
```python
|
| 76 |
import torch
|
| 77 |
import torchvision
|
|
|
|
| 106 |
outputs = net(img_tensor)
|
| 107 |
```
|
| 108 |
|
| 109 |
+
### Frame classification models
|
| 110 |
+
**Frame classification models** are used in step **3** of the data curation pipeline to classify a frame as either surgical or non-surgical:
|
| 111 |
|
| 112 |
```python
|
| 113 |
import torch
|
|
|
|
| 142 |
outputs = net(img_tensor)
|
| 143 |
```
|
| 144 |
|
| 145 |
+
### Non-surgical object detection models
|
| 146 |
+
**Non-surgical object detection models** are used to obliterate the non-surgical region in the surgical frames (e.g. user interface information):
|
| 147 |
|
| 148 |
```python
|
| 149 |
import torch
|
|
|
|
| 176 |
|
| 177 |
# Extract features from the image
|
| 178 |
outputs = net(img_tensor)
|
| 179 |
+
```
|