Corianas
/

Microllama_Char_500k_step

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Microllama_Char_500k_step / README.md

Corianas's picture

Create README.md

5f37180 verified 9 months ago

|

2.43 kB

	---
	license: cc-by-nc-sa-4.0
	datasets:
	- roneneldan/TinyStories
	---
	This is a character (english a-z 0-9 and so on) trained model following Andrej karpathy's llama.c project https://github.com/karpathy/llama2.c on both TinyStories and my own internal similar dataset I made. the last 150k is from a subset of cosmopedia I extracted for younger people.

	Trained for 49,152,000,000 tokens

	for it to see/output Uppercase letters this model uses a Shift-Key modifier before the letter to become uppercase, and has never been trained on actual uppercase letters.

	This modifier is ↨ and here are the functions I use to convert from straight text to the modified format and back.
	```
	def add_caseifer(text):
	# Using list comprehension for more efficient concatenation
	return ''.join(['↨' + char.lower() if char.isupper() else char for char in text

	def remove_caseifer(text):
	new_text = ""
	i = 0
	while i < len(text):
	if text[i] == "↨":
	if i+1 < len(text):
	new_text += text[i+1].upper()
	i += 1
	else:
	pass # skip this index
	else:
	new_text += text[i]
	i += 1
	return new_text
	```

	As such for test strings to use in chat try using somthing like:
	```
	↨hello, my name is ↨clara and ↨i like
	```

	Run history:
	iter ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
	loss/train █▅▄▅▃▅▄▃▄▄▄▄▃▃▄▄▂▁▂▂▃▃▂▃▃▃▂▃▂▃▂▂▂▂▃▁▂▂▁▂
	loss/val █▇▆▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
	lr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
	mfu ▅▅▅▄▅▅▅▅▅▅▄▅▅▅▁▅▅▅▄▅▅▅▅█▅▅▅▅▅▅▅▅▅▅▅▅█▅▅█
	step_time ██████▇████████▇████▇██▂▇██▇█████▇▇▇▁▇█▁
	tokens ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███

	Run summary:
	iter 500000
	loss/train 0.48935
	loss/val 0.45042
	lr 1e-05
	mfu 9.31042
	step_time 63441.47873
	tokens 49152000000