pszemraj
/

megalodon-200m-minipile

Text Generation

Model card Files Files and versions Community

megalodon-200m: minipile

Small pretraining experiment:

8192 ctx, approx 1 epoch
codebase: https://github.com/pszemraj/megalodon/tree/dataload-fixes
training logs

Model Configuration

Number of Layers: 12
Model Dimension: 1024
Z Dimension: 256
Value Dimension: 2048
Number of Heads: 1
FFN Hidden Dimension: 2560
CEMA NDIM: 16
Chunk Size: 2048
Efficient Attention: None
Initialization Mode: He
Vocabulary Size: 20480
Output Size: 20480
Normalization Groups: 32
Normalization Affine: True
Normalization Epsilon: 1e-05
ROPE Base: None
Dropout: 0.0
Hidden Dropout: 0.0
Attention Dropout: 0.0
SWIGLU: False
Rescale NFFN: False
Scale Embedding: False
Share Embedding: False
Layerwise Checkpointing: False

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train pszemraj/megalodon-200m-minipile