--- license: mit --- ``` =================================================================================================================== Layer (type:depth-idx) Output Shape Param # =================================================================================================================== MegaForMaskedLM [4, 2048, 50265] -- ├─MegaModel: 1-1 [4, 2048, 768] -- │ └─MegaEmbeddings: 2-1 [4, 2048, 768] -- │ │ └─Embedding: 3-1 [4, 2048, 768] 38,603,520 │ └─ModuleList: 2-2 -- -- │ │ └─MegaBlock: 3-2 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-3 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-4 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-5 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-6 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-7 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-8 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-9 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-10 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-11 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-12 [2048, 4, 768] 6,202,626 │ │ └─MegaBlock: 3-13 [2048, 4, 768] 6,202,626 ├─Linear: 1-2 [4, 2048, 50265] 38,653,785 =================================================================================================================== Total params: 151,688,817 Trainable params: 151,688,817 Non-trainable params: 0 Total mult-adds (G): 150.35 =================================================================================================================== Input size (MB): 0.07 Forward/backward pass size (MB): 10818.75 Params size (MB): 606.71 Estimated Total Size (MB): 11425.52 =================================================================================================================== ```