
We're not a company, just a small group of students. It takes huge compute to train these models till something high-quality comes out; you can help us pay our studies here.
We introduce HamzahLMV1, the first version of Hamzah Language Model, a series of upcoming models designed to have a tiny bit of a personality, be smart (for their size) and promptable to be beyond for what they've been trained on (=> high instruction following).
Quick performance metadata:
- Model Series: HamzahLM
- Model Version: V1
- Model Parameters: 1.2B
- Context Length: 128k tokens
- Recommended Max Generation Length: 2k - 8k
- Other Notes: Large ctx length; this model is good at processing large context while being coherent; you can exploit this using RAG or similar.
A 3B version is in its way! You're able to access the V0 3B for free through our endpoint, serving the full 128k context with acceptable processing & perfect generation performance.
Changes made to this in comparison to V0:
- Also trained on XeTute/iloveuser-1k and a thought-tag-modified version of open-thoughts/OpenThoughts-114k
- This time, trained on XeTute/Eastern-Alpaca-14k for one entire epoch, regardless of if the loss was low half-way-through; used a higher dropout rate to prevent low-quality results though
- Used a higher context length (numbers in $$tokens$$):
- Last Time: 2048 + 0512 (0512 padding; used through unified V RAM)
- This Time: 2048 + 6144 (6144 padding; used through unified V RAM)
- Resulted in a 3h higher-training time; used 5 minute breaks for the GPU to cool down each half hour
Using this settings, we feel the ratio from V0 to V1 is ~ the same ratio as for Meta LLaMA2 to Meta LLaMA3 ;)
Our Apps & Socials
Chat with our Assistant | Support us Financially | Visit our GitHub
Long live the Islamic Republic of Pakistan; Glory to the Islamic Republic of Pakistan 🇵🇰
- Downloads last month
- 0