train3 / README.md
llaa33219's picture
Upload 4 files
6d15327 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: CoDA Fine-tuning
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
hf_oauth: true
hf_oauth_scopes:
  - read-repos
  - write-repos

CoDA Model Fine-tuning Space

This Space allows you to fine-tune the Salesforce/CoDA-v0-Instruct text generation diffusion model on the baseten-admin/gpt-oss120b-generated-perfectblend dataset.

Features

  • 🎯 Full Fine-tuning: Complete parameter fine-tuning (not LoRA)
  • πŸ’¬ ChatML Format: Processes conversation data with question-answer pairs
  • πŸ”„ Auto Upload: Automatically uploads trained model to your Hugging Face account
  • πŸ“Š Progress Tracking: Real-time training progress updates
  • πŸ” OAuth Integration: Secure authentication via Hugging Face login

How to Use

  1. Login: Click the "Sign in with Hugging Face" button
  2. Configure: Adjust training parameters (epochs, batch size, learning rate)
  3. Train: Click "Start Training" (requires GPU - upgrade Space to GPU tier)
  4. Resume: If training is interrupted, check "Resume from last checkpoint" and restart
  5. Upload: After training completes, click "Upload to Hugging Face Hub"

Persistence

This Space supports checkpoint persistence:

  • Training checkpoints are saved every 500 steps
  • If interrupted, you can resume from the last checkpoint
  • For Docker deployment: Mount /data volume for full persistence
  • On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier

Requirements

  • Hardware: GPU (T4, A10G, or better) strongly recommended
  • Account: Hugging Face account with write permissions
  • Time: Training takes several hours depending on configuration

About the Model

CoDA (Code Diffusion with Autoregressive) is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data.

Model Configuration

{
  "architectures": ["CoDALanguageModel"],
  "hidden_size": 2048,
  "num_hidden_layers": 28,
  "num_attention_heads": 16,
  "vocab_size": 151936,
  "max_position_embeddings": 40960
}

Dataset

The training uses the baseten-admin/gpt-oss120b-generated-perfectblend dataset:

  • Format: Conversational data in ChatML format
  • Column: conversations (list of role-content pairs)
  • Split: Uses train split with 90/10 train/eval split

Training Details

  • Optimizer: AdamW
  • Precision: FP16 (on GPU)
  • Gradient Accumulation: 4 steps
  • Gradient Checkpointing: Enabled for memory efficiency
  • Max Sequence Length: 2048 tokens

Citation

If you use this Space or the CoDA model, please cite:

@article{coda2023,
  title={CoDA: Bidirectional Code Diffusion},
  author={Salesforce AI Research},
  journal={arXiv preprint},
  year={2023}
}

License

Apache 2.0