File size: 3,694 Bytes
6a617e3 31fc7e1 744ba87 9713221 744ba87 9713221 744ba87 31fc7e1 c850c95 31fc7e1 9713221 31fc7e1 c850c95 31fc7e1 d33d720 6a617e3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
license: mit
language:
- en
base_model:
- openai/clip-vit-large-patch14
pipeline_tag: video-classification
tags:
- dance
- vision
- breaking
---
# CLIP-Based Break Dance Move Classifier
A deep learning model for classifying break dance moves using CLIP (Contrastive Language-Image Pre-Training) embeddings. The model is fine-tuned on break dance videos to classify different power moves including windmills, halos, swipes, and baby mills.
## Features
- Video-based classification using CLIP embeddings
- Multi-frame temporal analysis
- Configurable frame sampling and data augmentation
- Real-time inference using Cog
- Misclassification analysis tools
- Hyperparameter tuning support
## Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Install Cog (if not already installed)
curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
chmod +x /usr/local/bin/cog
```
## Cog
download the weights
```bash
gdown https://drive.google.com/uc?id=1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf -O ./checkpoints/
```
build the image
```bash
cog build --separate-weights
```
push a new image
```bash
cog push
```
## Training
download the training data
```bash
gdown https://drive.google.com/uc?id=11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link -O ./data/
```
```bash
# Run training with default configuration
python scripts/train.py
# Run hyperparameter tuning
python scripts/hyperparameter_tuning.py
```
## Inference
```bash
# Using Cog for inference
cog predict -i video=@path/to/your/video.mp4
# Using standard Python script
python scripts/inference.py --video path/to/your/video.mp4
```
## Analysis
```bash
# Generate misclassification report
python scripts/visualization/miscalculations_report.py
# Visualize model performance
python scripts/visualization/visualize.py
```
## Project Structure
```
clip/
βββ src/ # Source code
β βββ data/ # Dataset and data processing
β βββ models/ # Model architecture
β βββ utils/ # Utility functions
βββ scripts/ # Training and inference scripts
β βββ visualization/ # Visualization tools
βββ config/ # Configuration files
βββ runs/ # Training runs and checkpoints
βββ cog.yaml # Cog configuration
βββ requirements.txt # Python dependencies
```
## Training Data
To run training on your own, you can find the training data [here](https://drive.google.com/drive/folders/11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link) and put it in the a directory at the root of the project called `./data`.
## Checkpoints
To run predictions with cog or locally on an existing checkpoint, you can find a checkpoint and configuration files [here](https://drive.google.com/drive/folders/1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf?usp=sharing) and put them in the a directory at the root of the project called `./checkpoints`.
## Model Architecture
- Base: CLIP ViT-Large/14
- Custom temporal pooling layer
- Fine-tuned vision encoder (last 3 layers)
- Output: 4-class classifier
## License
MIT License
Copyright (c) 2024 Bryant Wolf
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{clip-breakdance-classifier,
author = {Bryant Wolf},
title = {CLIP-Based Break Dance Move Classifier},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://github.com/bawolf/breaking_vision_clip_cog}}
}
``` |