VIT-CodeGPT CAD Code Generator

This model generates CADQuery Python code from images of 3D CAD objects. It uses a Vision Transformer (ViT) encoder and CodeGPT decoder in a vision-encoder-decoder architecture.

Model Details

  • Architecture: Vision Encoder-Decoder (ViT + CodeGPT)
  • Encoder: google/vit-base-patch16-224
  • Decoder: microsoft/CodeGPT-small-py
  • Task: Image-to-Code Generation (CAD)
  • Dataset: CADCODER/GenCAD-Code
  • Training Samples: 10,000 (8,500 train / 1,500 val)
  • Training Time: ~4 hours 12 minutes

Training Configuration

  • Batch Size: 4 (effective: 16 with gradient accumulation)
  • Learning Rate: 3e-5
  • Epochs: 3
  • Max Length: 256 tokens
  • Optimizer: AdamW with warmup
  • Mixed Precision: FP16

Performance

Final training metrics:

  • ROUGE-1: 0.0944
  • ROUGE-2: 0.0040
  • ROUGE-L: 0.0863

Usage

from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer
from PIL import Image
import torch

# Load the model
model = VisionEncoderDecoderModel.from_pretrained("Thehunter99/vit-codegpt-cadcoder")
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
tokenizer = AutoTokenizer.from_pretrained("microsoft/CodeGPT-small-py")

# Load and process image
image = Image.open("path/to/your/cad_image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# Generate CAD code
with torch.no_grad():
    generated_ids = model.generate(
        pixel_values,
        max_length=256,
        num_beams=4,
        early_stopping=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_code = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_code)

Example Output

Input: Image of a 3D cube Output:

import cadquery as cq

# Create a simple cube
result = cq.Workplane("XY").box(10, 10, 10)

Training Data

The model was trained on the CADCODER/GenCAD-Code dataset, which contains pairs of 3D CAD images and their corresponding CADQuery Python code.

Limitations

  • Limited to CADQuery syntax
  • Best performance on geometric shapes similar to training data
  • May struggle with very complex or unusual CAD designs
  • Maximum output length: 256 tokens

Citation

If you use this model, please cite:

@misc{vit-codegpt-cadcoder,
  title={VIT-CodeGPT CAD Code Generator},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Thehunter99/vit-codegpt-cadcoder}
}
Downloads last month
13
Safetensors
Model size
239M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Thehunter99/vit-codegpt-cadcoder