pjajal commited on
Commit
4666d52
·
verified ·
1 Parent(s): 16ce9fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -4
README.md CHANGED
@@ -2,9 +2,94 @@
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
 
 
 
 
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: https://github.com/pjjajal/adaperceiver-public
9
- - Paper: https://arxiv.org/abs/2511.18105
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
+ - vision
6
+ - perceiver
7
+ - adaptive-computation
8
+ - image-classification
9
+ license: mit
10
+ datasets:
11
+ - timm/imagenet-1k-wds
12
  ---
13
 
14
+ # AdaPerceiver (ImageNet-1K Fine-Tuned)
15
+
16
+ This repository hosts the **ImageNet-1K fine-tuned AdaPerceiver model**, introduced in
17
+ **“AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens”**.
18
+
19
+ 📄 Paper: https://arxiv.org/abs/2511.18105
20
+ 📦 Code: https://github.com/pjajal/AdaPerceiver
21
+ 📚 Model Collection: https://huggingface.co/collections/pjajal/adaperceiver-v1
22
+
23
+ This model is fine-tuned from the **logit + feature distilled AdaPerceiver backbone** trained on ImageNet-12K (found [here](https://huggingface.co/pjajal/adaperceiver-v1)).
24
+
25
+ ---
26
+
27
+ ## Model Description
28
+
29
+ **AdaPerceiver** is a Perceiver-style transformer architecture designed for **runtime-adaptive computation**.
30
+ A single trained model can dynamically trade off **accuracy and compute** by adjusting:
31
+
32
+ - the **number of latent tokens**,
33
+ - the **effective depth**, and
34
+ - the **embedding dimension**.
35
+
36
+ This specific checkpoint corresponds to the **ImageNet-1K classification fine-tuned AdaPerceiver model**, described in Appendix D.2 of the paper.
37
+
38
+ ---
39
+
40
+ ## Training Details
41
+
42
+ - **Fine-Tuning Data:** ImageNet-1K
43
+ - **Initialization:** Logit + feature distilled AdaPerceiver (ImageNet-12K)
44
+ - **Objective:** Supervised classification fine-tuning
45
+ - **Architecture:** Adaptive Perceiver with block-masked attention and Matryoshka FFNs
46
+ - **Adaptivity Axes:** Tokens, Depth, Width
47
+
48
+ During fine-tuning, the AdaPerceiver backbone is frozen and only the classification head, output tokens, and output cross-attention layers are updated.
49
+
50
+ For full training details, see Appendix D of the paper.
51
+
52
+ ---
53
+
54
+ ## How to Use
55
+
56
+ This model can be loaded using the AdaPerceiver Hub-compatible classification class.
57
+
58
+ ```python
59
+ import torch
60
+ from hub.networks.adaperceiver_classification import ClassificationAdaPerceiver
61
+
62
+ model = ClassificationAdaPerceiver.from_pretrained(
63
+ "pjajal/adaperceiver-v1-in1k-ft"
64
+ )
65
+
66
+ # forward(
67
+ # x: input image tensor (B, C, H, W)
68
+ # num_tokens: number of latent tokens to process (optional)
69
+ # mat_dim: embedding dimension (optional)
70
+ # depth: early-exit depth (optional)
71
+ # depth_tau: confidence threshold for early exit (optional)
72
+ # token_grans: block-mask granularities (optional)
73
+ # )
74
+ out = model(
75
+ torch.randn(1, 3, 224, 224),
76
+ num_tokens=256,
77
+ mat_dim=192,
78
+ depth=12,
79
+ )
80
+
81
+ print(out.logits.shape)
82
+ ```
83
+
84
+ ## Reference
85
+
86
+ If you use this models please cite the AdaPerceiver paper:
87
+
88
+ ```bibtex
89
+ @article{jajal2025adaperceiver,
90
+ title={AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens},
91
+ author={Jajal, Purvish and Eliopoulos, Nick John and Chou, Benjamin Shiue-Hal and Thiruvathukal, George K and Lu, Yung-Hsiang and Davis, James C},
92
+ journal={arXiv preprint arXiv:2511.18105},
93
+ year={2025}
94
+ }
95
+ ```