phi3-mini
with extended tokenizer with 52k-unicode-hindi

training only embedding layers with final loss -> ~1.18