Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -20,6 +20,7 @@ Disclaimer: The team releasing Mask2Former did not write a model card for this m
|
|
20 |
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
|
21 |
[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
|
22 |
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
|
|
|
23 |
In the paper [Mask2Former for Video Instance Segmentation
|
24 |
](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
|
25 |
|
@@ -34,9 +35,9 @@ You can use this particular checkpoint for instance segmentation. See the [model
|
|
34 |
Here is how to use this model:
|
35 |
|
36 |
```python
|
37 |
-
import requests
|
38 |
import torch
|
39 |
-
|
|
|
40 |
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
|
41 |
|
42 |
|
@@ -46,7 +47,7 @@ model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/video-mask
|
|
46 |
|
47 |
file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
|
48 |
video = torchvision.io.read_video(file_path)[0]
|
49 |
-
video_frames = [image_processor(images=frame, return_tensors="pt"
|
50 |
video_input = torch.cat(video_frames)
|
51 |
|
52 |
with torch.no_grad():
|
|
|
20 |
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
|
21 |
[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
|
22 |
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
|
23 |
+
|
24 |
In the paper [Mask2Former for Video Instance Segmentation
|
25 |
](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
|
26 |
|
|
|
35 |
Here is how to use this model:
|
36 |
|
37 |
```python
|
|
|
38 |
import torch
|
39 |
+
import torchvision
|
40 |
+
from huggingface_hub import hf_hub_download
|
41 |
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
|
42 |
|
43 |
|
|
|
47 |
|
48 |
file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
|
49 |
video = torchvision.io.read_video(file_path)[0]
|
50 |
+
video_frames = [image_processor(images=frame, return_tensors="pt").pixel_values for frame in video]
|
51 |
video_input = torch.cat(video_frames)
|
52 |
|
53 |
with torch.no_grad():
|