Anonymous Authors commited on
Commit ·
9b9fe26
1
Parent(s): 446f008
Rename displayed model name to ViTeX-Edit-14B in the model card
Browse filesREADME heading, file-tree comments, and the Composite-variant
section heading all switched. Same change applied to the docstrings
of inference_example.py and make_corp_baseline.py and to the
'Loading ... trained weights' log line. Repository URL, the bundled
weights filename (vitex_14b.safetensors), and the local clone target
directory are intentionally unchanged.
- README.md +5 -5
- inference_example.py +2 -2
- make_corp_baseline.py +3 -3
README.md
CHANGED
|
@@ -8,7 +8,7 @@ tags:
|
|
| 8 |
- diffusion
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# ViTeX-14B (Model & Inference code)
|
| 12 |
|
| 13 |
🌐 [Project page](https://vitex-bench.github.io/) ·
|
| 14 |
📊 [Dataset](https://huggingface.co/datasets/ViTeX-Bench/ViTeX-Dataset) ·
|
|
@@ -34,8 +34,8 @@ Open reference model for **video scene text editing**. Augments Wan2.1-VACE-14B
|
|
| 34 |
|
| 35 |
```
|
| 36 |
.
|
| 37 |
-
├── inference_example.py run ViTeX-14B on one (video, mask, glyph) tuple
|
| 38 |
-
├── make_corp_baseline.py build the ViTeX-14B (Composite) variant
|
| 39 |
├── vitex_14b.safetensors (8 GB, trained adapter weights)
|
| 40 |
├── diffsynth/ bundled inference library
|
| 41 |
└── base_model/ (70 GB, frozen DiT + T5-XXL + Wan VAE)
|
|
@@ -68,9 +68,9 @@ python inference_example.py \
|
|
| 68 |
--output out.mp4
|
| 69 |
```
|
| 70 |
|
| 71 |
-
## Locality-preserving variant: ViTeX-14B (Composite)
|
| 72 |
|
| 73 |
-
`make_corp_baseline.py` is a deterministic, training-free post-processing wrapper. Two per-frame operations: (1) Reinhard mean–variance LAB color matching against the source's local lighting; (2) signed-distance feathered alpha compositing onto the source. Inside the mask the result is the predicted glyphs (color-matched); outside the feather it is byte-identical to the source. Locality metrics rise to near-Identity while SeqAcc / CharAcc move within ~0.01 of raw ViTeX-14B.
|
| 74 |
|
| 75 |
```bash
|
| 76 |
python make_corp_baseline.py \
|
|
|
|
| 8 |
- diffusion
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# ViTeX-Edit-14B (Model & Inference code)
|
| 12 |
|
| 13 |
🌐 [Project page](https://vitex-bench.github.io/) ·
|
| 14 |
📊 [Dataset](https://huggingface.co/datasets/ViTeX-Bench/ViTeX-Dataset) ·
|
|
|
|
| 34 |
|
| 35 |
```
|
| 36 |
.
|
| 37 |
+
├── inference_example.py run ViTeX-Edit-14B on one (video, mask, glyph) tuple
|
| 38 |
+
├── make_corp_baseline.py build the ViTeX-Edit-14B (Composite) variant
|
| 39 |
├── vitex_14b.safetensors (8 GB, trained adapter weights)
|
| 40 |
├── diffsynth/ bundled inference library
|
| 41 |
└── base_model/ (70 GB, frozen DiT + T5-XXL + Wan VAE)
|
|
|
|
| 68 |
--output out.mp4
|
| 69 |
```
|
| 70 |
|
| 71 |
+
## Locality-preserving variant: ViTeX-Edit-14B (Composite)
|
| 72 |
|
| 73 |
+
`make_corp_baseline.py` is a deterministic, training-free post-processing wrapper. Two per-frame operations: (1) Reinhard mean–variance LAB color matching against the source's local lighting; (2) signed-distance feathered alpha compositing onto the source. Inside the mask the result is the predicted glyphs (color-matched); outside the feather it is byte-identical to the source. Locality metrics rise to near-Identity while SeqAcc / CharAcc move within ~0.01 of raw ViTeX-Edit-14B.
|
| 74 |
|
| 75 |
```bash
|
| 76 |
python make_corp_baseline.py \
|
inference_example.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
"""
|
| 2 |
-
ViTeX-14B inference example (self-contained).
|
| 3 |
|
| 4 |
Assumes you cloned this HuggingFace repo and are running this script from the
|
| 5 |
repo root. The bundled `diffsynth/` library, `vitex_14b.safetensors` weights,
|
|
@@ -119,7 +119,7 @@ def build_pipeline(device="cuda:0"):
|
|
| 119 |
redirect_common_files=False,
|
| 120 |
)
|
| 121 |
|
| 122 |
-
print(f"Loading ViTeX-14B trained weights from {ADAPTER_CKPT}")
|
| 123 |
state = load_state_dict(ADAPTER_CKPT)
|
| 124 |
res = pipe.vace.load_state_dict(state, strict=False)
|
| 125 |
print(f" loaded {len(state)} keys (missing {len(res.missing_keys)}, unexpected {len(res.unexpected_keys)})")
|
|
|
|
| 1 |
"""
|
| 2 |
+
ViTeX-Edit-14B inference example (self-contained).
|
| 3 |
|
| 4 |
Assumes you cloned this HuggingFace repo and are running this script from the
|
| 5 |
repo root. The bundled `diffsynth/` library, `vitex_14b.safetensors` weights,
|
|
|
|
| 119 |
redirect_common_files=False,
|
| 120 |
)
|
| 121 |
|
| 122 |
+
print(f"Loading ViTeX-Edit-14B trained weights from {ADAPTER_CKPT}")
|
| 123 |
state = load_state_dict(ADAPTER_CKPT)
|
| 124 |
res = pipe.vace.load_state_dict(state, strict=False)
|
| 125 |
print(f" loaded {len(state)} keys (missing {len(res.missing_keys)}, unexpected {len(res.unexpected_keys)})")
|
make_corp_baseline.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
-
"""Build the ViTeX-14B (Composite) baseline.
|
| 2 |
|
| 3 |
For each test clip:
|
| 4 |
-
1. Read source video, ViTeX-14B prediction, and the dilated text mask.
|
| 5 |
2. Color-correct the prediction inside the mask to match the source by
|
| 6 |
Reinhard-style mean+std matching in LAB space, using a 20-px band just
|
| 7 |
outside the mask as the reference (so the local lighting is captured).
|
|
@@ -148,7 +148,7 @@ def main():
|
|
| 148 |
ap.add_argument("--records", required=True)
|
| 149 |
ap.add_argument("--data_root", required=True)
|
| 150 |
ap.add_argument("--pred_dir", required=True,
|
| 151 |
-
help="Directory of ViTeX-14B raw predictions (e.g., ViTeX-14B_orig)")
|
| 152 |
ap.add_argument("--out_dir", required=True,
|
| 153 |
help="Where the corp baseline mp4s are written")
|
| 154 |
ap.add_argument("--target_frames", type=int, default=120)
|
|
|
|
| 1 |
+
"""Build the ViTeX-Edit-14B (Composite) baseline.
|
| 2 |
|
| 3 |
For each test clip:
|
| 4 |
+
1. Read source video, ViTeX-Edit-14B prediction, and the dilated text mask.
|
| 5 |
2. Color-correct the prediction inside the mask to match the source by
|
| 6 |
Reinhard-style mean+std matching in LAB space, using a 20-px band just
|
| 7 |
outside the mask as the reference (so the local lighting is captured).
|
|
|
|
| 148 |
ap.add_argument("--records", required=True)
|
| 149 |
ap.add_argument("--data_root", required=True)
|
| 150 |
ap.add_argument("--pred_dir", required=True,
|
| 151 |
+
help="Directory of ViTeX-Edit-14B raw predictions (e.g., ViTeX-Edit-14B_orig)")
|
| 152 |
ap.add_argument("--out_dir", required=True,
|
| 153 |
help="Where the corp baseline mp4s are written")
|
| 154 |
ap.add_argument("--target_frames", type=int, default=120)
|