juancopi81's picture
Add t5x and mt3 models
b100e1c
raw
history blame
342 Bytes
# Decoder-only model (XXL) with 4762357760 parameters.
include 't5x/examples/decoder_only/models/base.gin' # imports vocab, optimizer and model.
# ------------------- Network specification overrides --------------------------
network.TransformerConfig:
emb_dim = 4096
num_heads = 64
num_layers = 24
head_dim = 64
mlp_dim = 10240