YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
---
language: en
license: other
license_name: apple-sample-code-non-commercial
license_link: https://github.com/apple/ml-4m/blob/main/LICENSE
library_name: nano4M
tags:
- multimodal
- foundation-model
- nano4m
- flextok
- masked-modeling
- com-304
- research-only
---
# nano4M cdparker/4M_with_naive_flextok
This is a nano4M multimodal foundation model trained as part of the COM-304 course at EPFL.
## Model Description
nano4M is a small-scale reimplementation of 4M (Massively Multimodal Masked Modeling), a unified multimodal foundation model trained to predict any modality from any other.
- **Architecture**: Encoder-decoder transformer, **6 encoder + 6 decoder layers**, width **512**.
- **Modalities**:
- `tok_rgb@256` โ RGB image tokens (Cosmos tokenizer)
- `tok_depth@256` โ depth tokens
- `tok_normal@256` โ surface-normal tokens
- `tok_rgb_flextok@256` โ RGB tokens from the FlexTok tokenizer
- `scene_desc` โ scene description text tokens
- **Training data**: MultiCLEVR
- **Training objective**: `naive`
## Variants
This model is one of three variants trained in this project:
| Variant | Objective |
|---------|--------------------------------------------------|
| naive | Standard random-token masked prediction |
| vlpa | Variable-length partial-autoregressive masking |
| span | Span-based masking |
## Usage
```python
from nanofm.models import load_model_from_safetensors # adjust import to your repo
from huggingface_hub import hf_hub_download
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
ckpt_path = hf_hub_download(
repo_id="cdparker/4M_with_naive_flextok",
filename="checkpoint-final.safetensors",
)
model = load_model_from_safetensors(ckpt_path, device=device)
```
## Reference
- Mizrahi et al., *4M: Massively Multimodal Masked Modeling*, NeurIPS 2023.
- Original 4M codebase: https://github.com/apple/ml-4m
- Downloads last month
- 8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support