kylielee505
/

myttvs

Model card Files Files and versions Community

kylielee505 commited on 8 days ago

Commit

b9de419

•

1 Parent(s): 1e01ce9

Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +0 -1
ast_indexer +0 -0
hub/damo/text-to-video-synthesis/.mdl +0 -0
hub/damo/text-to-video-synthesis/.msc +0 -0
hub/damo/text-to-video-synthesis/README.md +105 -0
hub/damo/text-to-video-synthesis/VQGAN_autoencoder.pth +3 -0
hub/damo/text-to-video-synthesis/configuration.json +34 -0
hub/damo/text-to-video-synthesis/open_clip_pytorch_model.bin +3 -0
hub/damo/text-to-video-synthesis/text2video_pytorch_model.pth +3 -0

.gitattributes CHANGED Viewed

@@ -25,7 +25,6 @@
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

ast_indexer ADDED Viewed

The diff for this file is too large to render. See raw diff

hub/damo/text-to-video-synthesis/.mdl ADDED Viewed

Binary file (51 Bytes). View file

hub/damo/text-to-video-synthesis/.msc ADDED Viewed

Binary file (403 Bytes). View file

hub/damo/text-to-video-synthesis/README.md ADDED Viewed

	@@ -0,0 +1,105 @@

+---
+tasks:
+- text-to-video-synthesis
+widgets:
+  - task: text-to-video-synthesis
+    inputs:
+        - type: text
+          name: text
+          title: 输入英文prompt
+          validator:
+            max_words: 75
+    examples:
+        - name: 1
+          title: 示例1
+          inputs:
+            - name: text
+              data: A panda eating bamboo on a rock.
+    inferencespec:
+      cpu: 4
+      memory: 16000
+      gpu: 1
+      gpu_memory: 32000
+domain:
+- multi-modal
+frameworks:
+- pytorch
+backbone:
+- diffusion
+metrics:
+- realism
+- text-video similarity
+license: Apache License 2.0
+tags:
+- text2video generation
+- diffusion model
+- 文到视频
+- 文生视频
+- 文本生成视频
+- 生成
+---
+# 文本生成视频大模型-英文-通用领域
+本模型基于多阶段文本到视频生成扩散模型, 输入描述文本，返回符合文本描述的视频。仅支持英文输入。
+## 模型描述
+文本到视频生成扩散模型由文本特征提取、文本特征到视频隐空间扩散模型、视频隐空间到视频视觉空间这3个子网络组成，整体模型参数约17亿。支持英文输入。扩散模型采用Unet3D结构，通过从纯高斯噪声视频中，迭代去噪的过程，实现视频生成的功能。
+### 期望模型使用方式以及适用范围
+本模型适用范围较广，能基于任意英文文本描述进行推理，生成视频。
+### 如何使用
+在ModelScope框架下，通过调用简单的Pipeline即可使用当前模型，其中，输入需为字典格式，合法键值为'text'，内容为一小段文本。该模型暂仅支持在GPU上进行推理。输入具体代码示例如下：
+#### 补充运行环境
+ ```shell
+ pip install open_clip_torch
+ ```
+#### 代码范例
+```python
+from modelscope.pipelines import pipeline
+from modelscope.outputs import OutputKeys
+p = pipeline('text-to-video-synthesis', 'damo/text-to-video-synthesis')
+test_text = {
+        'text': 'A panda eating bamboo on a rock.',
+    }
+output_video_path = p(test_text,)[OutputKeys.OUTPUT_VIDEO]
+print('output_video_path:', output_video_path)
+```
+### 模型局限性以及可能的偏差
+* 模型基于Webvid等公开数据集进行训练，生成结果可能会存在与训练数据分布相关的偏差。
+* 该模型无法实现完美的影视级生成。
+* 该模型无法生成清晰的文本。
+* 该模型主要是用英文语料训练的，暂不支持其他语言。
+* 该模型在复杂的组合性生成任务上表现有待提升。
+### 滥用、恶意使用和超出范围的使用
+* 该模型未经过训练以真实地表示人或事件，因此使用该模型生成此类内容超出了该模型的能力范围。
+* 禁止用于对人或其环境、文化、宗教等产生贬低、或有害的内容生成。
+* 禁止用于涉黄、暴力和血腥内容生成。
+* 禁止用于错误和虚假信息生成。
+## 训练数据介绍
+训练数据包括LAION5B, ImageNet, Webvid等公开数据集。经过美学得分、水印得分、去重等预训练进行图像和视频过滤。
+## 相关论文以及引用信息
+```BibTeX
+@misc{rombach2021highresolution,
+      title={High-Resolution Image Synthesis with Latent Diffusion Models},
+      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
+      year={2021},
+      eprint={2112.10752},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```

hub/damo/text-to-video-synthesis/VQGAN_autoencoder.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88ecb782561455673c4b78d05093494b9c539fc6bfc08f3a9a4a0dd7b0b10f36
+size 5214865159

hub/damo/text-to-video-synthesis/configuration.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{   "framework": "pytorch",
+    "task": "text-to-video-synthesis",
+    "model": {
+        "type": "latent-text-to-video-synthesis",
+        "model_args": {
+            "ckpt_clip": "open_clip_pytorch_model.bin",
+            "ckpt_unet": "text2video_pytorch_model.pth",
+            "ckpt_autoencoder": "VQGAN_autoencoder.pth",
+            "max_frames": 16,
+            "tiny_gpu": 1
+        },
+        "model_cfg": {
+            "unet_in_dim": 4,
+            "unet_dim": 320,
+            "unet_y_dim": 768,
+            "unet_context_dim": 1024,
+            "unet_out_dim": 4,
+            "unet_dim_mult": [1, 2, 4, 4],
+            "unet_num_heads": 8,
+            "unet_head_dim": 64,
+            "unet_res_blocks": 2,
+            "unet_attn_scales": [1, 0.5, 0.25],
+            "unet_dropout": 0.1,
+            "temporal_attention": "True",
+            "num_timesteps": 1000,
+            "mean_type": "eps",
+            "var_type": "fixed_small",
+            "loss_type": "mse"
+        }
+    },
+    "pipeline": {
+        "type": "latent-text-to-video-synthesis"
+    }
+}

hub/damo/text-to-video-synthesis/open_clip_pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a78ef8e8c73fd0df621682e7a8e8eb36c6916cb3c16b291a082ecd52ab79cc4
+size 3944692325

hub/damo/text-to-video-synthesis/text2video_pytorch_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9609d02717b799137a97244844ab6df0d1a071568a1d24dcb62d9050f3a24a3
+size 5645549049