language: el

el-llama-smol

Model:

el-llama-smol aims to be the first in a series of LLMs trained mostly in Greek corpora. The model is a small (1bn parameters) version of LLama, with the following configuration.

{
  "architectures": ["LLaMAForCausalLM"],
  "bos_token_id": 0,
  "eos_token_id": 1,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "intermediate_size": 5461,
  "initializer_range": 0.02,
  "max_sequence_length": 1024,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 24,
  "pad_token_id": -1,
  "rms_norm_eps": 1e-06,
  "transformers_version": "4.28.1",
  "use_cache": true,
  "vocab_size": 22000
}

Training details:

The current snapshot has been trained for 40hrs with an RTX A6000 GPU (48G), using the galore_adamw8bit_per_layer optimizer by Zhao et. al [1] and a context size of 1024 tokens.

Dataset:

The model is trained on the Greek subset of the allenai/c4 dataset. Text tokenization is performed with a (heavily unoptimized) tokenizer with vocab size of 22000 tokens, trained with SentencePiece

Examples

Use a 🤗 pipeline


from transformers import pipeline
pipe = pipeline("text-generation", model="Konstantinos/el_llama_smol")

set_seed(1)
prompt = """Η Ιαπωνία έχει μια ιστορία που ξεκινά πριν από χιλιάδες χρόνια. 
Οι επιστήμονες πιστεύουν πως οι Ιάπωνες ως ενιαίο σύνολο προέρχονται από πολλές ομάδες,
οι οποίες μετανάστευσαν στα νησιά από άλλα σημεία της Ασίας, στα οποία περιλαμβάνονται """

ret = pipe(prompt, do_sample=True, top_k=20, temperature=0.85,  max_new_tokens=110)

Load model directly


from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Konstantinos/el_llama_smol")
model = AutoModelForCausalLM.from_pretrained("Konstantinos/el_llama_smol")

References

[1] Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, & Yuandong Tian. (2024). GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.

Citation

TBD

license: odc-by

Downloads last month
130
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.