File size: 3,694 Bytes
6a617e3
 
 
 
 
 
 
 
 
 
 
 
31fc7e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
744ba87
 
9713221
 
 
 
 
 
744ba87
 
 
 
 
 
9713221
744ba87
 
 
 
 
31fc7e1
 
c850c95
 
 
 
 
 
31fc7e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9713221
 
 
 
 
 
 
 
31fc7e1
 
 
 
 
 
 
 
 
c850c95
 
 
 
 
31fc7e1
 
 
 
 
 
d33d720
 
 
 
 
 
 
 
6a617e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
license: mit
language:
- en
base_model:
- openai/clip-vit-large-patch14
pipeline_tag: video-classification
tags:
- dance
- vision
- breaking
---
# CLIP-Based Break Dance Move Classifier

A deep learning model for classifying break dance moves using CLIP (Contrastive Language-Image Pre-Training) embeddings. The model is fine-tuned on break dance videos to classify different power moves including windmills, halos, swipes, and baby mills.

## Features

- Video-based classification using CLIP embeddings
- Multi-frame temporal analysis
- Configurable frame sampling and data augmentation
- Real-time inference using Cog
- Misclassification analysis tools
- Hyperparameter tuning support

## Setup

```bash
# Install dependencies
pip install -r requirements.txt

# Install Cog (if not already installed)
curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
chmod +x /usr/local/bin/cog
```

## Cog

download the weights

```bash
gdown https://drive.google.com/uc?id=1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf -O ./checkpoints/
```

build the image

```bash
cog build --separate-weights
```

push a new image

```bash
cog push
```

## Training

download the training data

```bash
gdown https://drive.google.com/uc?id=11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link -O ./data/
```

```bash
# Run training with default configuration
python scripts/train.py

# Run hyperparameter tuning
python scripts/hyperparameter_tuning.py
```

## Inference

```bash
# Using Cog for inference
cog predict -i video=@path/to/your/video.mp4

# Using standard Python script
python scripts/inference.py --video path/to/your/video.mp4
```

## Analysis

```bash
# Generate misclassification report
python scripts/visualization/miscalculations_report.py

# Visualize model performance
python scripts/visualization/visualize.py
```

## Project Structure

```
clip/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ data/              # Dataset and data processing
β”‚   β”œβ”€β”€ models/            # Model architecture
β”‚   └── utils/             # Utility functions
β”œβ”€β”€ scripts/               # Training and inference scripts
β”‚   └── visualization/     # Visualization tools
β”œβ”€β”€ config/                # Configuration files
β”œβ”€β”€ runs/                  # Training runs and checkpoints
β”œβ”€β”€ cog.yaml              # Cog configuration
└── requirements.txt      # Python dependencies
```

## Training Data

To run training on your own, you can find the training data [here](https://drive.google.com/drive/folders/11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link) and put it in the a directory at the root of the project called `./data`.

## Checkpoints

To run predictions with cog or locally on an existing checkpoint, you can find a checkpoint and configuration files [here](https://drive.google.com/drive/folders/1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf?usp=sharing) and put them in the a directory at the root of the project called `./checkpoints`.

## Model Architecture

- Base: CLIP ViT-Large/14
- Custom temporal pooling layer
- Fine-tuned vision encoder (last 3 layers)
- Output: 4-class classifier

## License

MIT License

Copyright (c) 2024 Bryant Wolf

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{clip-breakdance-classifier,
  author = {Bryant Wolf},
  title = {CLIP-Based Break Dance Move Classifier},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://github.com/bawolf/breaking_vision_clip_cog}}
}
```