Corianas
/

Microllama_Char_500k_step

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Corianas commited on Apr 9, 2024

Commit

5f37180

·

verified ·

1 Parent(s): 213fd74

Create README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+license: cc-by-nc-sa-4.0
+datasets:
+- roneneldan/TinyStories
+---
+This is a character (english a-z 0-9 and so on) trained model following Andrej karpathy's llama.c project https://github.com/karpathy/llama2.c on both TinyStories and my own internal similar dataset I made. the last 150k is from a subset of cosmopedia I extracted for younger people.
+Trained for      49,152,000,000 tokens
+for it to see/output Uppercase letters this model uses a Shift-Key modifier before the letter to become uppercase, and has never been trained on actual uppercase letters.
+This modifier is ↨ and here are the functions I use to convert from straight text to the modified format and back.
+```
+def add_caseifer(text):
+    # Using list comprehension for more efficient concatenation
+    return ''.join(['↨' + char.lower() if char.isupper() else char for char in text
+def remove_caseifer(text):
+    new_text = ""
+    i = 0
+    while i < len(text):
+        if text[i] == "↨":
+            if i+1 < len(text):
+                new_text += text[i+1].upper()
+                i += 1
+            else:
+                pass  # skip this index
+        else:
+            new_text += text[i]
+        i += 1
+    return new_text
+```
+As such for test strings to use in chat try using somthing like:
+```
+↨hello, my name is ↨clara and ↨i like
+```
+Run history:
+      iter ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
+loss/train █▅▄▅▃▅▄▃▄▄▄▄▃▃▄▄▂▁▂▂▃▃▂▃▃▃▂▃▂▃▂▂▂▂▃▁▂▂▁▂
+  loss/val █▇▆▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
+        lr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
+       mfu ▅▅▅▄▅▅▅▅▅▅▄▅▅▅▁▅▅▅▄▅▅▅▅█▅▅▅▅▅▅▅▅▅▅▅▅█▅▅█
+ step_time ██████▇████████▇████▇██▂▇██▇█████▇▇▇▁▇█▁
+    tokens ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
+Run summary:
+      iter 500000
+loss/train 0.48935
+  loss/val 0.45042
+        lr 1e-05
+       mfu 9.31042
+ step_time 63441.47873
+    tokens 49152000000