Edit model card

This is a character (english a-z 0-9 and so on) trained model following Andrej karpathy's llama.c project https://github.com/karpathy/llama2.c on both TinyStories and my own internal similar dataset I made. the last 150k is from a subset of cosmopedia I extracted for younger people.

Trained for 49,152,000,000 tokens

for it to see/output Uppercase letters this model uses a Shift-Key modifier before the letter to become uppercase, and has never been trained on actual uppercase letters.

This modifier is ↨ and here are the functions I use to convert from straight text to the modified format and back.

def add_caseifer(text):
    # Using list comprehension for more efficient concatenation
    return ''.join(['↨' + char.lower() if char.isupper() else char for char in text
    
def remove_caseifer(text):
    new_text = ""
    i = 0
    while i < len(text):
        if text[i] == "↨":
            if i+1 < len(text):
                new_text += text[i+1].upper()
                i += 1
            else:
                pass  # skip this index
        else:
            new_text += text[i]
        i += 1
    return new_text

As such for test strings to use in chat try using somthing like:

↨hello, my name is ↨clara and ↨i like

Run history: iter β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ loss/train β–ˆβ–…β–„β–…β–ƒβ–…β–„β–ƒβ–„β–„β–„β–„β–ƒβ–ƒβ–„β–„β–‚β–β–‚β–‚β–ƒβ–ƒβ–‚β–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–ƒβ–β–‚β–‚β–β–‚ loss/val β–ˆβ–‡β–†β–…β–…β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β– lr ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ mfu β–…β–…β–…β–„β–…β–…β–…β–…β–…β–…β–„β–…β–…β–…β–β–…β–…β–…β–„β–…β–…β–…β–…β–ˆβ–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–…β–ˆβ–…β–…β–ˆ step_time β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‚β–‡β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–β–‡β–ˆβ– tokens β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ

Run summary: iter 500000 loss/train 0.48935 loss/val 0.45042 lr 1e-05 mfu 9.31042 step_time 63441.47873 tokens 49152000000

Downloads last month
60
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Corianas/Microllama_Char_500k_step