Konstantinos
/

el_llama_smol

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

el_llama_smol / README.md

Konstantinos's picture

Update README.md

59d8384 verified 7 months ago

|

history blame contribute delete

No virus

2.92 kB

	---
	license: odc-by
	language:
	- el

	widget:
	- text: "Η Ιαπωνία έχει μια ιστορία που ξεκινά πριν από χιλιάδες χρόνια. Οι επιστήμονες πιστεύουν πως οι Ιάπωνες ως ενιαίο σύνολο προέρχονται από πολλές ομάδες, οι οποίες μετανάστευσαν στα νησιά από άλλα σημεία της Ασίας, στα οποία περιλαμβάνονται "

	tags:
	- text-generation-inference
	---


	---
	language: el
	---

	# el-llama-smol


	## Model:
	`el-llama-smol` aims to be the first in a series of LLMs trained mostly in Greek corpora. The model is a small (1bn parameters) version of LLama, with the following configuration.

	```json
	{
	"architectures": ["LLaMAForCausalLM"],
	"bos_token_id": 0,
	"eos_token_id": 1,
	"hidden_act": "silu",
	"hidden_size": 2048,
	"intermediate_size": 5461,
	"initializer_range": 0.02,
	"max_sequence_length": 1024,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 24,
	"pad_token_id": -1,
	"rms_norm_eps": 1e-06,
	"transformers_version": "4.28.1",
	"use_cache": true,
	"vocab_size": 22000
	}
	```



	## Training details:

	The current snapshot has been trained for 40hrs with an RTX A6000 GPU (48G), using the `galore_adamw8bit_per_layer` optimizer by Zhao et. al [1] and a context size of 1024 tokens.


	## Dataset:
	The model is trained on the Greek subset of the [allenai/c4](https://huggingface.co/datasets/allenai/c4) dataset. Text tokenization is performed with a (heavily unoptimized) tokenizer with vocab size of 22000 tokens, trained with [SentencePiece](https://github.com/google/sentencepiece)



	## Examples

	#### Use a 🤗 pipeline
	```python

	from transformers import pipeline
	pipe = pipeline("text-generation", model="Konstantinos/el_llama_smol")

	set_seed(1)
	prompt = """Η Ιαπωνία έχει μια ιστορία που ξεκινά πριν από χιλιάδες χρόνια.
	Οι επιστήμονες πιστεύουν πως οι Ιάπωνες ως ενιαίο σύνολο προέρχονται από πολλές ομάδες,
	οι οποίες μετανάστευσαν στα νησιά από άλλα σημεία της Ασίας, στα οποία περιλαμβάνονται """

	ret = pipe(prompt, do_sample=True, top_k=20, temperature=0.85, max_new_tokens=110)
	```

	#### Load model directly
	```python

	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("Konstantinos/el_llama_smol")
	model = AutoModelForCausalLM.from_pretrained("Konstantinos/el_llama_smol")
	```

	## References

	[1] Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, & Yuandong Tian. (2024). GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.



	## Citation

	TBD
	---
	license: odc-by
	-