MarcusLoren
/

MeshGPT-preview

Text-to-3D

Transformers

Inference Endpoints

Model card Files Files and versions Community

MarcusLoren commited on Jun 6, 2024

Commit

ddd8656

verified ·

1 Parent(s): edf8d63

Update README.md

Browse files

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+### MeshGPT-alpha-preview
+MeshGPT is a text-to-3D model based on an autoencoder (tokenizer) and a transformer to generate the tokens.
+The autoencoder's purpose is to be able to translate 3D models into tokens which then the decoder part of it can convert back to 3D mesh.<br/>
+For all purposes and definitions the autoencoder is the **world first** published **3D model tokenizer**! (correct me if i'm wrong!)
+## Model Details
+The autoencoder (tokenizer) is a relative small model using 50M parameters and the transformer model uses 184M parameters and the core is based on GPT2-small.
+Due to hardware contraints it's trained using a codebook/vocabablity size of 2048.<br/>
+Devoloped by: Me (with credits for MeshGPT codebase to [Phil Wang](https://github.com/lucidrains))
+### Warning:
+This model has been created without any sponsors or renting any GPU hardware, so it has a very limited capability in terms what it can generate.
+It can handle fine single objects such as 'chair' or 'table' but more complex objects requires more training (see training dataset section).
+### Usage:
+Install:
+```
+pip install git+https://github.com/MarcusLoppe/meshgpt-pytorch.git
+```
+```
+from meshgpt_pytorch import (
+    MeshAutoencoder,
+    MeshTransformer,
+    mesh_render
+)
+device = "cuda" if torch.cuda.is_available() else "cpu"
+transformer = MeshTransformer.from_pretrained("MarcusLoren/MeshGPT_tiny_alpha").to(device)
+output = []
+for text in [ 'bed' , "chair"]:
+    face_coords, face_mask = transformer.generate(texts = [text],  temperature = 0.0)
+    # (batch, num faces, vertices (3), coordinates (3)), (batch, num faces)
+    output.append(face_coords)
+mesh_render.combind_mesh(f'./render.obj', output)
+```
+## Training dataset
+I've only had access to the free tier GPU on kaggle so this model is only trained on 4k models with max 250 triangles.
+The dataset contains total of 800 text labels so in terms what it can generate it's limited.
+3D models was sourced from [objaverse](https://huggingface.co/datasets/allenai/objaverse), [shapenet](https://huggingface.co/datasets/ShapeNet/shapenetcore-gltf) and [ModelNet40](https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset/data).
+## How it works:
+MeshGPT uses an autoencoder which takes 3D mesh (has support for quads but not implemented in this model) then quantizes them into a codebook which can be used as tokens.
+The second part of MeshGPT is the transformer that trains on the tokens generated by the autoencoder while cross-attending to a text embedding.
+The final product is a tokenizer and a transformer that can input a text embedding and then autoregressive generate a 3D model based on the text input.
+The tokens generated by the transformer can then be converted into 3D mesh using the autoencoder.
+## Credits
+The idea for MeshGPT came from the paper ( https://arxiv.org/abs/2311.15475 ) but the creators didn't release any code or model.
+Phil Wang (https://github.com/lucidrains) drew inspiration from the paper and did a ton of improvements over the papers implementation and created the repo : https://github.com/lucidrains/meshgpt-pytorch
+My goal has been to figure out how to train and implement MeshGPT into reality. <br/>
+See my github repo for a notebook on how to get started training your own MeshGPT! [MarcusLoppe/meshgpt-pytorch](https://github.com/MarcusLoppe/meshgpt-pytorch/)