ChallengerSpaceShuttle commited on
Commit
8039b09
1 Parent(s): 4ddd599

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -1
README.md CHANGED
@@ -8,5 +8,94 @@ base_model: google/gemma-2-2b
8
  pipeline_tag: text-generation
9
  ---
10
 
11
- Continued Pretrained Gemma2-2b on IsiZulu Dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
 
8
  pipeline_tag: text-generation
9
  ---
10
 
11
+ # BafoGPT-3B
12
+
13
+ This is gemma2-2b-base model continued-pretraining on the [ChallengerSpaceShuttle/zulu-pretraining-dataset](https://huggingface.co/datasets/ChallengerSpaceShuttle/zulu-pretraining-dataset) dataset.
14
+
15
+ This is the first iteration, on building IsiZulu models that can attain performance comparable to models that typically require millions of dollars to train from scratch.
16
+
17
+ ## 🔍 Applications
18
+
19
+ This is the base model and has a context length of 8k. It can generate coherent Zulu text, one can finetune it based on instruction datasets.
20
+
21
+ ## ⚡ Quantized models
22
+
23
+ ## 🏆 Evaluation
24
+
25
+ ## 🧩 Configuration
26
+
27
+ The code used to train the model can be found here: [BafoGPT](https://github.com/Motsepe-Jr/bafoGPT/tree/main) with the following training configuration.
28
+
29
+ ```yaml
30
+ model_name: google/gemma-2-2b
31
+ out_dir: pretrained_model/models
32
+ precision: bf16-mixed
33
+ initial_checkpoint_dir: google/gemma-2-2b
34
+ resume: false
35
+ data:
36
+ class_path: litgpt.data.LitData
37
+ init_args:
38
+ data_path: data
39
+ seed: 42
40
+ num_workers: 8
41
+ train:
42
+ save_interval: 1000
43
+ log_interval: 1
44
+ global_batch_size: 4
45
+ micro_batch_size: 1
46
+ lr_warmup_steps: 2000
47
+ max_tokens: 156800708
48
+ max_seq_length: 2048
49
+ tie_embeddings: false
50
+ max_norm: 1.0
51
+ min_lr: 4.0e-05
52
+ eval:
53
+ interval: 1000
54
+ max_iters: 100
55
+ initial_validation: false
56
+ final_validation: true
57
+ optimizer: AdamW
58
+ devices: auto
59
+ num_nodes: 1
60
+ tokenizer_dir: google/gemma-2-2b
61
+ logger_name: tensorboard
62
+ seed: 42
63
+ ```
64
+
65
+ Architecture Config
66
+
67
+ ```json
68
+ {
69
+ "architectures": [
70
+ "Gemma2ForCausalLM"
71
+ ],
72
+ "attention_bias": false,
73
+ "attention_dropout": 0.0,
74
+ "attn_logit_softcapping": 50.0,
75
+ "bos_token_id": 2,
76
+ "cache_implementation": "hybrid",
77
+ "eos_token_id": 1,
78
+ "final_logit_softcapping": 30.0,
79
+ "head_dim": 256,
80
+ "hidden_act": "gelu_pytorch_tanh",
81
+ "hidden_activation": "gelu_pytorch_tanh",
82
+ "hidden_size": 2304,
83
+ "initializer_range": 0.02,
84
+ "intermediate_size": 9216,
85
+ "max_position_embeddings": 8192,
86
+ "model_type": "gemma2",
87
+ "num_attention_heads": 8,
88
+ "num_hidden_layers": 26,
89
+ "num_key_value_heads": 4,
90
+ "pad_token_id": 0,
91
+ "query_pre_attn_scalar": 256,
92
+ "rms_norm_eps": 1e-06,
93
+ "rope_theta": 10000.0,
94
+ "sliding_window": 4096,
95
+ "torch_dtype": "float32",
96
+ "transformers_version": "4.42.4",
97
+ "use_cache": true,
98
+ "vocab_size": 288256
99
+ }
100
+ ```
101