Spaces:
Runtime error
Runtime error
Update app.py
Browse files
app.py
CHANGED
@@ -155,7 +155,7 @@ title = "End-to-End Referring Video Object Segmentation with Multimodal Transfor
|
|
155 |
|
156 |
description = "This notebook provides a (limited) hands-on demonstration of MTTR. Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. To use it, upload an .mp4 video file and enter a text query which describes one of the object instances in that video."
|
157 |
|
158 |
-
article = "**Disclaimer:** <br> This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever. Hence, the model's performance may be limited, especially on instances from unseen categories. <br> Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately). <br> Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed. <br> <p style='text-align: center'><a href='https://github.com/mttr2021/MTTR'>Github Repo</a></p>"
|
159 |
|
160 |
examples = [['guy in white shirt performing tricks on a bike', 'bike_tricks_2.mp4'],
|
161 |
['a man riding a surfboard', 'surfing.mp4'],
|
@@ -166,8 +166,7 @@ examples = [['guy in white shirt performing tricks on a bike', 'bike_tricks_2.mp
|
|
166 |
['person in blue riding a bike', 'blue_biker_riding.mp4'],
|
167 |
['a dog to the right', 'dog_and_cat.mp4'],
|
168 |
['a person hugging a dog', 'girl_hugging_dog.mp4'],
|
169 |
-
['a black bike used to perform tricks', 'bike_tricks_1.mp4']
|
170 |
-
['a black horse playing with a person', 'horse_plays_ball.mp4']]
|
171 |
|
172 |
iface = gr.Interface(fn=process,
|
173 |
inputs=[gr.inputs.Textbox(label="text query"), gr.inputs.Video(label="input video - first 10 seconds are used")],
|
|
|
155 |
|
156 |
description = "This notebook provides a (limited) hands-on demonstration of MTTR. Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. To use it, upload an .mp4 video file and enter a text query which describes one of the object instances in that video."
|
157 |
|
158 |
+
article = "Check out [MTTR's GitHub page](https://github.com/mttr2021/MTTR) for more info about this project. <br> Also, check out our [Colab notebool](https://gradio.app/docs/) for much faster processing (GPU accelerated) and more options! <br> **Disclaimer:** <br> This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever. Hence, the model's performance may be limited, especially on instances from unseen categories. <br> Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately). <br> Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed. <br> <p style='text-align: center'><a href='https://github.com/mttr2021/MTTR'>Github Repo</a></p>"
|
159 |
|
160 |
examples = [['guy in white shirt performing tricks on a bike', 'bike_tricks_2.mp4'],
|
161 |
['a man riding a surfboard', 'surfing.mp4'],
|
|
|
166 |
['person in blue riding a bike', 'blue_biker_riding.mp4'],
|
167 |
['a dog to the right', 'dog_and_cat.mp4'],
|
168 |
['a person hugging a dog', 'girl_hugging_dog.mp4'],
|
169 |
+
['a black bike used to perform tricks', 'bike_tricks_1.mp4']]
|
|
|
170 |
|
171 |
iface = gr.Interface(fn=process,
|
172 |
inputs=[gr.inputs.Textbox(label="text query"), gr.inputs.Video(label="input video - first 10 seconds are used")],
|