0-hero
/

OLMo-50M-Mixture-of-Depths-Bitnet

Model card Files Files and versions Community

OLMo-50M-Mixture-of-Depths-Bitnet / README.md

0-hero's picture

Update README.md

9054ac1 verified 5 months ago

|

history blame contribute delete

No virus

831 Bytes

	---
	license: apache-2.0
	datasets:
	- allenai/dolma
	---
	# Training run to compare Mixture-of-Depths, Bitnet
	[Wandb Report](https://api.wandb.ai/links/tulasiram/pw76q41i)

	![image/png"](https://cdn-uploads.huggingface.co/production/uploads/6382255fcae34727b9cc149e/-ovvzj0ZvzuArH0cdOz8b.png)

	#### 4 Models trained for 100k steps on Dolma
	- OLMo-50M - 50M parameter model
	- OLMo-50M-bitlinear - 50M parameter bitnet model
	- OLMo-50M-mod - 50M parameter mixture-of-depths model
	- OLMo-50M-mod-bitlinear - 50M parameter mixture-of-depths bitnet model

	Repo has zip files which include training states and other files for each model. I am not the author of the mixture-of-depths implementation, it can be found [here](https://github.com/thepowerfuldeez/OLMo)
	This is the first run. A few things might be broken, still a work in progress