Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
vladbogo 
posted an update Feb 27
Post
Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co/blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!

Thanks a lot for the blog post, it's very informative 🤗

text to game is super exciting!

·

totally agree. Can't wait to see what comes next