MeshGPT-preview / README.md

Update README.md

c0525bf verified 5 months ago

4.46 kB

	---
	license: apache-2.0
	---


	### MeshGPT-alpha-preview

	MeshGPT is a text-to-3D model based on an autoencoder (tokenizer) and a transformer to generate the tokens.
	The autoencoder's purpose is to be able to translate 3D models into tokens which then the decoder part of it can convert back to 3D mesh.<br/>
	For all purposes and definitions the autoencoder is the world first published 3D model tokenizer! (correct me if i'm wrong!)

	## Model Details
	The autoencoder (tokenizer) is a relative small model using 50M parameters and the transformer model uses 184M parameters and the core is based on GPT2-small.
	Due to hardware contraints it's trained using a codebook/vocabablity size of 2048.<br/>
	Devoloped & trained by: Me with credits for MeshGPT codebase to [Phil Wang](https://github.com/lucidrains)

	## Preformance:
	CPU 10 triangles/s<br/>
	3060 GPU: 40 triangles/s<br/>
	4090 GPU: 110 triangles/s<br/>

	### Warning:
	This model has been created without any sponsors or renting any GPU hardware, so it has a very limited capability in terms what it can generate.
	It can handle fine single objects such as 'chair' or 'table' but more complex objects requires more training (see training dataset section).

	### Usage:

	Install:

	```
	pip install git+https://github.com/MarcusLoppe/meshgpt-pytorch.git
	```
	```
	from meshgpt_pytorch import (
	MeshAutoencoder,
	MeshTransformer,
	mesh_render
	)

	device = "cuda" if torch.cuda.is_available() else "cpu"
	transformer = MeshTransformer.from_pretrained("MarcusLoren/MeshGPT_tiny_alpha").to(device)

	output = []
	output.append((transformer.generate(texts = ['sofa','bed', 'computer screen', 'bench', 'chair', 'table' ] , temperature = 0.0) ))
	output.append((transformer.generate(texts = ['milk carton', 'door', 'shovel', 'heart', 'trash can', 'ladder'], temperature = 0.0) ))
	output.append((transformer.generate(texts = ['hammer', 'pedestal', 'pickaxe', 'wooden cross', 'coffee bean', 'crowbar'], temperature = 0.0) ))
	output.append((transformer.generate(texts = ['key', 'minecraft character', 'dragon head', 'open book', 'minecraft turtle', 'wooden table'], temperature = 0.0) ))
	output.append((transformer.generate(texts = ['gun', 'ice cream cone', 'axe', 'helicopter', 'shotgun', 'plastic bottle'], temperature = 0.0) ))

	mesh_render.save_rendering(f'./render.obj', output)

	```
	## Expected output:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/K04Qj_xgwmNT_MldTA1l8.png)


	Random samples generated by text only:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/UH1r5s9Lfj4sUSgClqhrf.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/657e233acec775bfe0d5cbc6/oxZnaUldcmvGfJprWLa-w.png)




	## Training dataset
	I've only had access to the free tier GPU on kaggle so this model is only trained on 4k models with max 250 triangles.
	The dataset contains total of 800 text labels so in terms what it can generate it's limited.
	3D models was sourced from [objaverse](https://huggingface.co/datasets/allenai/objaverse), [shapenet](https://huggingface.co/datasets/ShapeNet/shapenetcore-gltf) and [ModelNet40](https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset/data).

	## How it works:
	MeshGPT uses an autoencoder which takes 3D mesh (has support for quads but not implemented in this model) then quantizes them into a codebook which can be used as tokens.
	The second part of MeshGPT is the transformer that trains on the tokens generated by the autoencoder while cross-attending to a text embedding.

	The final product is a tokenizer and a transformer that can input a text embedding and then autoregressive generate a 3D model based on the text input.
	The tokens generated by the transformer can then be converted into 3D mesh using the autoencoder.

	## Credits
	The idea for MeshGPT came from the paper ( https://arxiv.org/abs/2311.15475 ) but the creators didn't release any code or model.
	Phil Wang (https://github.com/lucidrains) drew inspiration from the paper and did a ton of improvements over the papers implementation and created the repo : https://github.com/lucidrains/meshgpt-pytorch
	My goal has been to figure out how to train and implement MeshGPT into reality. <br/>
	See my github repo for a notebook on how to get started training your own MeshGPT! [MarcusLoppe/meshgpt-pytorch](https://github.com/MarcusLoppe/meshgpt-pytorch/)