LH-Tech-AI commited on
Commit
d6dfa66
·
verified ·
1 Parent(s): 99ad209

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - HuggingFaceFW/fineweb-edu
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - small
9
+ - cpu
10
+ - open
11
+ - open-source
12
+ - crest
13
+ - lh-tech
14
+ - ai
15
+ - llm
16
+ - nanoGPT
17
+ ---
18
+
19
+ # Welcome to Crest 20M Base
20
+ This is a tiny 20.75M parameter model showing how small models can perform on a little bunch of data.
21
+
22
+ ## Training data
23
+ We used the first 100 million tokens of the 10BT Sample of Fineweb-Edu to train this model for 5000 steps to a final val loss of 4.1566.
24
+
25
+ ## Training specs
26
+ - Architecture: nanoGPT
27
+ - Parameters: 20.75M
28
+ - Train Steps: 5000 (5k)
29
+ - Learning Rate: 5e-4
30
+ - Layers: 10
31
+ - Heads: 8
32
+ - Embed Layers: 256
33
+ - Block Size (Context lenght): 512
34
+ - Batch Size: 32
35
+ - Gradient Accumulation Steps: 4
36
+ - Compile model: False
37
+ - Device Type: float16 - CUDA on Kaggle T4 16GB GPU
38
+
39
+ ## Training code
40
+ As in all of our models, you can find the full training code in this repo in the files `train.py`, `model.py`, `configurator.py` and `prepare.py`.
41
+
42
+ ## Model weights
43
+ The final model weights can be found as `model.pt` in this repo. Use `use.py` to try out the model :D
44
+
45
+ ## Example outputs
46
+ **Prompt:** Artificial Intelligence is
47
+ <br>**Output:**
48
+ ```plaintext
49
+ Artificial Intelligence is the ability to make intelligent decisions.
50
+ It is a process of understanding how to do things. It is designed to understand the principles of intelligence and the skills to be successful.
51
+ There are various types of intelligence and the ability to communicate information about the process. They can use more than one or more of these functions.
52
+ What is the reason for being successful is that they are successful in one or more of those of the tasks. They must be able to use the knowledge to understand and understand information about the process.
53
+ What is the best way to understand how to communicate information.
54
+ The simplest way to understand the concept of intelligence is to understand how to communicate information about the process of communication.
55
+ In addition to being successful in the process of
56
+ ```
57
+
58
+ **Prompt:** The main concept of physics is
59
+ <br>**Output:**
60
+ ```plaintext
61
+ The main concept of physics is the energy of the universe, the natural world, and the space in which the universe are determined.
62
+ When we are in a universe, there are no other elements to go with, or a sphere or sphere or sphere. The universe of the universe is determined by the universe, which they are based on the laws of nature and the universe.
63
+ Since we are in the universe, the universe is not just the universe, but the universe is not just the universe. The universe is determined by the universe in the universe by the universe. In the universe, the universe is determined by the universe.
64
+ For the universe, the universe is determined by the universe, because the universe is determined by the universe. The universe is determined by the universe to
65
+ ```
66
+
67
+ **Prompt:** Albert Einstein was
68
+ <br>**Output:**
69
+ ```plaintext
70
+ Albert Einstein was the first to study the evolution of the universe. The universe of stars in the universe is the same as the universe of stars, which is the same as the universe of stars, which is the one and the other. Astronomers are the smallest universe of stars, which are very different from other stars.
71
+ According to Einstein, this means that the universe of stars is the same with the same star, which are the same as the universe of stars. These galaxies are called stars. But if we see the universe of stars, we see the stars of stars, which are the same that are the same. As we see the universe of stars in the universe of stars in the universe of stars in the universe of stars in the universe of stars.
72
+ ```
73
+
74
+ ## Quick Start
75
+ *Please install tiktoken first (`pip install tiktoken`)!*
76
+
77
+ If you want to train the model yourself, boot up a fresh T4 (or any other GPU with at least 16GB of VRAM; if you have less VRAM, decrease the Batch Size and increase the Gradient Accumulation Steps.) and start by downloading the needed files from this repository:
78
+ ```bash
79
+ mkdir crest_base_20m
80
+ cd crest_base_20m
81
+ wget https://huggingface.co/LH-Tech-AI/Crest-20M-Base/resolve/main/prepare.py
82
+ wget https://huggingface.co/LH-Tech-AI/Crest-20M-Base/resolve/main/model.py
83
+ wget https://huggingface.co/LH-Tech-AI/Crest-20M-Base/resolve/main/train.py
84
+ wget https://huggingface.co/LH-Tech-AI/Crest-20M-Base/resolve/main/configurator.py
85
+ ```
86
+ The next step is to prepare the data, so run:
87
+ ```bash
88
+ python3 prepare.py
89
+ ```
90
+ If all data has loaded, you can start the training:
91
+ ```bash
92
+ python3 train.py
93
+ ```
94
+ Then, you'll have to wait until iteration 5000 is reached (will log something like `iter 5000: loss 4.2044, time 50601.67ms, mfu 2.23%`).
95
+
96
+ ## Use the final model
97
+ To use your trained model - or ours that you can find in this repo as model.pt - you can run:
98
+ ```python
99
+ import torch
100
+ import tiktoken
101
+ import os
102
+ from model import GPTConfig, GPT
103
+
104
+ out_dir = 'out'
105
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
106
+ ckpt_path = os.path.join(out_dir, 'ckpt.pt')
107
+ checkpoint = torch.load(ckpt_path, map_location=device)
108
+ gptconf = GPTConfig(**checkpoint['model_args'])
109
+ model = GPT(gptconf)
110
+ state_dict = checkpoint['model']
111
+ unwanted_prefix = '_orig_mod.'
112
+ for k,v in list(state_dict.items()):
113
+ if k.startswith(unwanted_prefix):
114
+ state_dict[k[len(unwanted_prefix):]] = state_dict.pop(k)
115
+ model.load_state_dict(state_dict)
116
+ model.to(device)
117
+ model.eval()
118
+
119
+ enc = tiktoken.get_encoding("gpt2")
120
+ EOS_TOKEN_ID = 50256
121
+
122
+ def ask_gpt(prompt, max_new_tokens=150, temperature=0.7, top_k=25):
123
+ start_ids = enc.encode(prompt)
124
+ x = torch.tensor(start_ids, dtype=torch.long, device=device)[None, ...]
125
+
126
+ with torch.no_grad():
127
+ y = model.generate(x, max_new_tokens, temperature=temperature, top_k=top_k)
128
+
129
+ full_ids = y[0].tolist()
130
+ new_ids = full_ids[len(start_ids):]
131
+
132
+ response = enc.decode(new_ids)
133
+ response = response.split('<|endoftext|>')[0]
134
+ return response
135
+
136
+ print("--- Crest Completion Chat started ---")
137
+ while True:
138
+ user_input = input("\nYour Prompt: ")
139
+ if user_input.lower() in ['exit', 'quit']: break
140
+
141
+ antwort_rest = ask_gpt(user_input)
142
+
143
+ print(f"\nCrest Completion: {user_input}{antwort_rest}")
144
+ print("-" * 30)
145
+ ```
146
+
147
+ This will produce something like (You: "The climate change is"):
148
+ ```plaintext
149
+ Crest Completion: The climate change is about as much as the global warming is changing. The climate is the result of the climate change.
150
+ In the world that is the case with extreme weather conditions and climate change, it makes the world more productive. And it makes the world more productive, like the planet’s climate change.
151
+ It’s also why we are interested in climate change, we are interested in climate change, like climate change and climate change. We are interested in climate change and climate change.
152
+ The climate change in the world is already underway. It is the next step. The world is going to grow in a world where we live in a world where we live in a global society.
153
+ While we are interested in climate change, we are interested
154
+ ```
155
+
156
+ ## Limitations
157
+ - This model can't chat - it's a base model!
158
+ - This model is really dumb. It just has learned 100 million tokens for ~3.28 epochs.
159
+ - This model is not GPT-5.4 or Opus-4.7! Definitely not. :D
160
+
161
+ ## Final thoughts
162
+ We think, this model shows perfectly on how very small models can perform on general world knowledge data if we train them for multiple epochs.
163
+ We are kind of satisfied with these results and wonder what would happen, if we'd finetune this model with SFT to make it chat.