Konstantinos commited on
Commit
fee7c68
1 Parent(s): 2ee5449

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -9,6 +9,85 @@ widget:
9
  tags:
10
  - text-generation-inference
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
  license: odc-by
14
  -
 
9
  tags:
10
  - text-generation-inference
11
  ---
12
+
13
+
14
+ ---
15
+ language: el
16
+ ---
17
+
18
+ # el-llama-smol
19
+
20
+
21
+ ## Model:
22
+ `el-llama-smol` aims to be the first in a series of LLMs trained mostly in Greek corpora. The model is a small (1bn parameters) version of LLama, with the following configuration.
23
+
24
+ ```json
25
+ {
26
+ "architectures": ["LLaMAForCausalLM"],
27
+ "bos_token_id": 0,
28
+ "eos_token_id": 1,
29
+ "hidden_act": "silu",
30
+ "hidden_size": 2048,
31
+ "intermediate_size": 5461,
32
+ "initializer_range": 0.02,
33
+ "max_sequence_length": 1024,
34
+ "model_type": "llama",
35
+ "num_attention_heads": 32,
36
+ "num_hidden_layers": 24,
37
+ "pad_token_id": -1,
38
+ "rms_norm_eps": 1e-06,
39
+ "transformers_version": "4.28.1",
40
+ "use_cache": true,
41
+ "vocab_size": 22000
42
+ }
43
+ ```
44
+
45
+
46
+
47
+ ## Training details:
48
+
49
+ The current snapshot has been trained for 40hrs with a RTX A6000 GPU (48G), using the `galore_adamw8bit_per_layer` optimizer by Zhao et. al [1] and a context size of 1024 tokens.
50
+
51
+
52
+ ## Dataset:
53
+ The model is trained on the Greek subset of the [allenai/c4](https://huggingface.co/datasets/allenai/c4) dataset. Text tokenization is performed with a (heavily unoptimized) tokenizer with vocab size of 22000 tokens, trained with [SentencePiece](https://github.com/google/sentencepiece)
54
+
55
+
56
+
57
+ ## Examples
58
+
59
+ #### Use a 🤗 pipeline
60
+ ```python
61
+
62
+ from transformers import pipeline
63
+ pipe = pipeline("text-generation", model="Konstantinos/el_llama_smol")
64
+
65
+ set_seed(1)
66
+ prompt = """Η Ιαπωνία έχει μια ιστορία που ξεκινά πριν από χιλιάδες χρόνια.
67
+ Οι επιστήμονες πιστεύουν πως οι Ιάπωνες ως ενιαίο σύνολο προέρχονται από πολλές ομάδες,
68
+ οι οποίες μετανάστευσαν στα νησιά από άλλα σημεία της Ασίας, στα οποία περιλαμβάνονται """
69
+
70
+ ret = pipe(prompt, do_sample=True, top_k=20, temperature=0.85, max_new_tokens=110)
71
+ ```
72
+
73
+ #### Load model directly
74
+ ```python
75
+
76
+ from transformers import AutoTokenizer, AutoModelForCausalLM
77
+
78
+ tokenizer = AutoTokenizer.from_pretrained("Konstantinos/el_llama_smol")
79
+ model = AutoModelForCausalLM.from_pretrained("Konstantinos/el_llama_smol")
80
+ ```
81
+
82
+ ## References
83
+
84
+ [1] Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, & Yuandong Tian. (2024). GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
85
+
86
+
87
+
88
+ ## Citation
89
+
90
+ TBD
91
  ---
92
  license: odc-by
93
  -