metadata

license: cc-by-nc-sa-4.0
pipeline_tag: image-to-video
tags:
  - turing
  - autonomous driving
  - video generation
  - world model

Terra

Terra is a world model designed for autonomous driving and serves as a baseline model in th ACT-Bench framework. Terra generates video continuations based on short video clips of approximately three frames and trajectory instructions. A key feature of Terra is its high adherence to trajectory instructions, enabling accurate and reliable action-conditioned video generation.

How to use

We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.

Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, we have not include its implementation in this Hugging Face repository. Instead, the implementation and setup instructions for the Video Refiner are provided in ACT-Bench repository. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.

Install Packages

We use uv to manage python packages. If you don't have uv installed in your environment, please see the document of it.

$ git clone https://huggingface.co/turing-motors/Terra
$ uv sync

Action-Conditioned Video Generation without Video Refiner

$ python inference.py

This command generates a video using three image frames located in assets/conditioning_frames and the curving_to_left/curving_to_left_moderate trajectory defined in the trajectory template file assets/template_trajectory.json.

You can find more details by referring to the inference.py script.

Citation

@misc{arai2024actbench,
      title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving}, 
      author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi},
      year={2024},
      eprint={2412.05337},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05337}, 
}

turing-motors
/

Terra

Terra

Related Links

How to use

Install Packages

Action-Conditioned Video Generation without Video Refiner

Citation