YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

---
language: en
license: other
license_name: apple-sample-code-non-commercial
license_link: https://github.com/apple/ml-4m/blob/main/LICENSE
library_name: nano4M
tags:
- multimodal
- foundation-model
- nano4m
- flextok
- masked-modeling
- com-304
- research-only
---

# nano4M cdparker/4M_with_flpa_flextok

This is a nano4M multimodal foundation model trained as part of the COM-304 course at EPFL.

## Model Description

nano4M is a small-scale reimplementation of 4M (Massively Multimodal Masked Modeling), a unified multimodal foundation model trained to predict any modality from any other.

- **Architecture**: Encoder-decoder transformer, **6 encoder + 6 decoder layers**, width **512**.
- **Modalities**:
  - `tok_rgb@256` โ€” RGB image tokens (Cosmos tokenizer)
  - `tok_depth@256` โ€” depth tokens
  - `tok_normal@256` โ€” surface-normal tokens
  - `tok_rgb_flextok@256` โ€” RGB tokens from the FlexTok tokenizer
  - `scene_desc` โ€” scene description text tokens
- **Training data**: MultiCLEVR
- **Training objective**: `flpa`

## Variants

This model is one of three variants trained in this project:

| Variant | Objective                                        |
|---------|--------------------------------------------------|
| naive   | Standard random-token masked prediction          |
| vlpa    | Variable-length partial-autoregressive masking   |
| span    | Span-based masking                               |

## Usage

```python
from nanofm.models import load_model_from_safetensors  # adjust import to your repo
from huggingface_hub import hf_hub_download
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

ckpt_path = hf_hub_download(
    repo_id="cdparker/4M_with_flpa_flextok",
    filename="checkpoint-final.safetensors",
)
model = load_model_from_safetensors(ckpt_path, device=device)
```

## Reference

- Mizrahi et al., *4M: Massively Multimodal Masked Modeling*, NeurIPS 2023.
- Original 4M codebase: https://github.com/apple/ml-4m
Downloads last month
50
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support