nieshen
/

SMDM

Model card Files Files and versions Community

SMDM / README.md

nieshen's picture

Update README.md

bcda56b verified about 2 months ago

|

history blame contribute delete

873 Bytes

	## Pretrained models for the paper Scaling up Masked Diffusion Models on Text

	Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders.
	For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
	parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
	an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
	to random sequence lengths during pretraining.

	Math reasoning: please see the gsm8k_safetensors folder.

	Conditional generation: please see the sharegpt_safetensors folder.

	Reverse curse: please see the reverse_safetensors folder

	For all models, we provide models in `.pth` and `.safetensors` formats.