metadata

license: mit
license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
language:
  - en
pipeline_tag: text-generation
tags:
  - nlp
  - code
datasets:
  - LLM360/AmberDatasets

MobiLlama-05B

MobiLlama-05B is a Small Language Model with 0.5 billion parameters. It was trained using the Amber data sources Amber-Dataset.

Model Summary

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the ‘less is more’ paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource-constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on our Github.

Arxiv Paper Link

Model Description

Model type: Small Language Model (SLM) built using the architecture design of LLaMA-7B
Language(s) (NLP): English
License: Apache 2.0
Resources for more information:

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)

model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

Training DataMix

Subset	Tokens (Billion)
Arxiv	30.00
Book	28.86
C4	197.67
Refined-Web	665.01
StarCoder	291.92
StackExchange	21.75
Wikipedia	23.90
Total	1259.13

Hyperparameters

Hyperparameter	Value
Total Parameters	0.52B
Hidden Size	2048
Intermediate Size (MLPs)	5632
Number of Attention Heads	32
Number of Hidden Lyaers	22
RMSNorm ɛ	1e^-5
Max Seq Length	2048
Vocab Size	32000

Evaluation

Evaluation Benchmark	MobiLlama-0.5B	MobiLlama-0.8B	MobiLlama-1.2B
HellaSwag	52.52	54.09	62.99
MMLU	26.45	26.92	24.23
Arc Challenge	29.52	30.20	34.55
TruthfulQA	38.05	38.48	35.57
CrowsPairs	64.03	64.82	68.12
PIQA	72.03	73.17	75.29
Race	33.68	33.37	35.31
SIQA	40.22	41.60	41.96
Winogrande	57.53	57.45	61.08

Citation

BibTeX:

@misc{thawakar2024mobillama,
      title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT}, 
      author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
      year={2024},
      eprint={2402.16840},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}