shivi commited on
Commit
85d8bfa
1 Parent(s): fccc163

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -20,6 +20,7 @@ Disclaimer: The team releasing Mask2Former did not write a model card for this m
20
  Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
21
  [MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
22
  without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
 
23
  In the paper [Mask2Former for Video Instance Segmentation
24
  ](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
25
 
@@ -34,9 +35,9 @@ You can use this particular checkpoint for instance segmentation. See the [model
34
  Here is how to use this model:
35
 
36
  ```python
37
- import requests
38
  import torch
39
- from PIL import Image
 
40
  from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
41
 
42
 
@@ -46,7 +47,7 @@ model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/video-mask
46
 
47
  file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
48
  video = torchvision.io.read_video(file_path)[0]
49
- video_frames = [image_processor(images=frame, return_tensors="pt", do_resize=True, size=(480, 640)).pixel_values for frame in video]
50
  video_input = torch.cat(video_frames)
51
 
52
  with torch.no_grad():
 
20
  Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
21
  [MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
22
  without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
23
+
24
  In the paper [Mask2Former for Video Instance Segmentation
25
  ](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
26
 
 
35
  Here is how to use this model:
36
 
37
  ```python
 
38
  import torch
39
+ import torchvision
40
+ from huggingface_hub import hf_hub_download
41
  from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
42
 
43
 
 
47
 
48
  file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
49
  video = torchvision.io.read_video(file_path)[0]
50
+ video_frames = [image_processor(images=frame, return_tensors="pt").pixel_values for frame in video]
51
  video_input = torch.cat(video_frames)
52
 
53
  with torch.no_grad():