athirdpath
/

Llama-3.1-Base_NSFW-pretrained_e-0.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.1-Base_NSFW-pretrained_e-0.5 / README.md

athirdpath's picture

Update README.md

6414770 verified 4 months ago

|

history blame contribute delete

838 Bytes

	---
	license: llama3.1
	---

	Llama 3.1 Base, continually pretrained with 0.5 Epochs (2100 steps @ total batch 64) of the same 1.5gb private dataset that underpins Iambe

	Mostly a proof of concept, but outputs are better than expected. It'd likely be quite good with some instruction tuning.

	-----

	Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.

	By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p