Model Name Parameters Class Ratio Tokens Batch Size (Tokens) Training Loss ↓
GerbilLab/GerbilBlender-A-32m 32m A-Class 20 640M 262K 4.127

"Blender" models, inspired by UL2 pretraining, are trained equally in fill-in-the-middle, causal modelling, and masked language modelling tasks. Special tokens for these models include:

'<fitm_start>', '<multiple_tok_mask>', '<fitm_result>', '<causal>', '<mlm_start>', '<single_tok_mask>', '<mlm_end>'

# Example fill in the middle
'<fitm_start> this is an <multiple_tok_mask> for fill-in-the-middle <fitm_result> example text <|endoftext|>'

# Example causal language modelling
'<causal> this is an example text for causal language modelling <|endoftext|>'

# Example masked language modelling
'<mlm_start> this is an <single_tok_mask> text for masked language modelling <mlm_end> example <|endoftext|>'
Downloads last month
16
Safetensors
Model size
65.6M params
Tensor type
F32
·
BOOL
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.