|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
### MeshGPT-alpha-preview |
|
|
|
MeshGPT is a text-to-3D model based on an autoencoder (tokenizer) and a transformer to generate the tokens. |
|
The autoencoder's purpose is to be able to translate 3D models into tokens which then the decoder part of it can convert back to 3D mesh.<br/> |
|
For all purposes and definitions the autoencoder is the **world first** published **3D model tokenizer**! (correct me if i'm wrong!) |
|
|
|
## Model Details |
|
The autoencoder (tokenizer) is a relative small model using 50M parameters and the transformer model uses 184M parameters and the core is based on GPT2-small. |
|
Due to hardware contraints it's trained using a codebook/vocabablity size of 2048.<br/> |
|
Devoloped & trained by: Me with credits for MeshGPT codebase to [Phil Wang](https://github.com/lucidrains) |
|
|
|
## Preformance: |
|
CPU 10 triangles/s<br/> |
|
3060 GPU: 40 triangles/s<br/> |
|
4090 GPU: 110 triangles/s<br/> |
|
|
|
### Warning: |
|
This model has been created without any sponsors or renting any GPU hardware, so it has a very limited capability in terms what it can generate. |
|
It can handle fine single objects such as 'chair' or 'table' but more complex objects requires more training (see training dataset section). |
|
|
|
### Usage: |
|
|
|
Install: |
|
|
|
``` |
|
pip install git+https://github.com/MarcusLoppe/meshgpt-pytorch.git |
|
``` |
|
``` |
|
from meshgpt_pytorch import ( |
|
MeshAutoencoder, |
|
MeshTransformer, |
|
mesh_render |
|
) |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
transformer = MeshTransformer.from_pretrained("MarcusLoren/MeshGPT_tiny_alpha").to(device) |
|
|
|
output = [] |
|
output.append((transformer.generate(texts = ['sofa','bed', 'computer screen', 'bench', 'chair', 'table' ] , temperature = 0.0) )) |
|
output.append((transformer.generate(texts = ['milk carton', 'door', 'shovel', 'heart', 'trash can', 'ladder'], temperature = 0.0) )) |
|
output.append((transformer.generate(texts = ['hammer', 'pedestal', 'pickaxe', 'wooden cross', 'coffee bean', 'crowbar'], temperature = 0.0) )) |
|
output.append((transformer.generate(texts = ['key', 'minecraft character', 'dragon head', 'open book', 'minecraft turtle', 'wooden table'], temperature = 0.0) )) |
|
output.append((transformer.generate(texts = ['gun', 'ice cream cone', 'axe', 'helicopter', 'shotgun', 'plastic bottle'], temperature = 0.0) )) |
|
|
|
mesh_render.save_rendering(f'./render.obj', output) |
|
|
|
``` |
|
## Expected output: |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/K04Qj_xgwmNT_MldTA1l8.png) |
|
|
|
|
|
Random samples generated by text only: |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/UH1r5s9Lfj4sUSgClqhrf.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/oxZnaUldcmvGfJprWLa-w.png) |
|
|
|
|
|
|
|
|
|
## Training dataset |
|
I've only had access to the free tier GPU on kaggle so this model is only trained on 4k models with max 250 triangles. |
|
The dataset contains total of 800 text labels so in terms what it can generate it's limited. |
|
3D models was sourced from [objaverse](https://huggingface.co/datasets/allenai/objaverse), [shapenet](https://huggingface.co/datasets/ShapeNet/shapenetcore-gltf) and [ModelNet40](https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset/data). |
|
|
|
## How it works: |
|
MeshGPT uses an autoencoder which takes 3D mesh (has support for quads but not implemented in this model) then quantizes them into a codebook which can be used as tokens. |
|
The second part of MeshGPT is the transformer that trains on the tokens generated by the autoencoder while cross-attending to a text embedding. |
|
|
|
The final product is a tokenizer and a transformer that can input a text embedding and then autoregressive generate a 3D model based on the text input. |
|
The tokens generated by the transformer can then be converted into 3D mesh using the autoencoder. |
|
|
|
## Credits |
|
The idea for MeshGPT came from the paper ( https://arxiv.org/abs/2311.15475 ) but the creators didn't release any code or model. |
|
Phil Wang (https://github.com/lucidrains) drew inspiration from the paper and did a ton of improvements over the papers implementation and created the repo : https://github.com/lucidrains/meshgpt-pytorch |
|
My goal has been to figure out how to train and implement MeshGPT into reality. <br/> |
|
See my github repo for a notebook on how to get started training your own MeshGPT! [MarcusLoppe/meshgpt-pytorch](https://github.com/MarcusLoppe/meshgpt-pytorch/) |
|
|