|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 19.19 ms | 13.7% bf16 MFU | 53369 tok/s |
|
val loss 3.155447 |
|
prompt_length: 22 |
|
gen_tokens: 818 262 21593 286 262 6186 6290 29623 11 4837 5071 257 4271 6439 14893 326 550 3748 17112 290 3725 546 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: In the depths of the Amazon rainforest, researchers discovered a previously unknown tribe that had unique customs and knowledge about |
|
generating: |
|
--- |
|
the black wild turkey and their population. The survey found that black wild turkey was the king of Africa, a representative of a nearly half million people that are ethnically different from the doldrums of African Indians. African women played a leadership role in the tribe, stressing their tribal tradition, including qualities such as keeping privacy safe while they were hunting, preserving the cultural property and historical importance of the indigenous population. |
|
Prepared For World History by New Asgard |
|
Mangrove�s Life In ASTRO Forests Versus Alaska�s Maken Nation |
|
Crap: The Swedish Eel Cormorant�s White Canary Island (Maken) |
|
Swedish Púlys De Noma |
|
Genetic Analysis of the SHOWARfnˈh politica compare 10,000 BCE |
|
Chief Mostar Fishson (Cdew. The Shore of mountains) |
|
Girl Fishing Competences of the Olive Leaf Carpels |
|
Meyer Miðai Mardsson |
|
Mias°loumdrawner et ERVizbrads iðai |
|
brekaævappawtree |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 19.26 ms | 13.7% bf16 MFU | 53174 tok/s |
|
val loss 3.155447 |
|
prompt_length: 18 |
|
gen_tokens: 464 40455 4687 36789 468 7907 4263 286 12899 27982 11 13477 326 262 6881 318 5901 351 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The Hubble Space Telescope has captured images of distant galaxies, revealing that the universe is filled with |
|
generating: |
|
--- |
|
different gas and cosmic rays. |
|
In one image, the projection is in the form of a ray of the Nazca-type atmosphere – a plane of molten water. As the two refer to what looks like the skies the favorable conditions may lead to the formation of interstellar clouds or warm, warm cosmic clouds obscuring Earth itself if the country chose the satellites. |
|
�We�re nowhere near what we thought we might be in some time … we�re just being imaginative,� Maurzan Zhenzhenou said. |
|
Mekhoven, N.D., a former Mikhail Gorbachev and one of wealthier nations of the Soviet Union, where its older members gained renown for their friendly and generous views and wide-ranging views, is a country comprising the hominid moon Europa. |
|
Menesut stated that the dense atmosphere that is the atmosphere of the Europa atmosphere must have been layer upon layer in the two spiral passages on its surface. |
|
The giant Terra Dome includes a number of such space stations with the previous two discoveries about Europa, and is considered a potential recording star that young Europa can stay too near the surface of Europa. |
|
The ATL |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 18.31 ms | 14.4% bf16 MFU | 55940 tok/s |
|
val loss 3.155447 |
|
prompt_length: 23 |
|
gen_tokens: 464 5524 5215 462 4935 11 5668 287 5816 11 27661 477 262 10812 287 1692 7446 11 3756 284 19304 82 287 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The Human Genome Project, completed in 2003, mapped all the genes in human DNA, leading to breakthroughs in |
|
generating: |
|
--- |
|
those areas. Additional techniques are being developed to identify additional genes which could be added to existing DNA databases. |
|
Hermet, Adney�s long serving Charden, a working Professor at MIT�s Department of Bioengineering in the Department of Energy Chemistry and Biology-Baker Chemical Corporation teaching programs for physicists classed in MIT, paired a spring Tester with an Elucidation Map with a NASA NexRF cell phone (an interchangeable water-side cell phone is Futur�s Sponsored Deep Desire formation), to name a few. The Tester is a giant, high altitude solar-powered device that can acutely heat and process water. |
|
Imagenix, developed from Instagens, goes back to pre-1965 to its own time, 1850-1810s. Ten years on, implants are continuing to be manufactured and eventually the Charden line moved into production. |
|
Japan Design Bounty Project |
|
Cunningham family used for her work on �Courtney and the Mustang,� Guadak and Shimaze starting a quest to perfect the earth�s crust and its environments to determine their families� |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 19.11 ms | 13.8% bf16 MFU | 53598 tok/s |
|
val loss 3.155447 |
|
prompt_length: 17 |
|
gen_tokens: 464 14250 286 262 13570 1803 416 38579 20336 287 262 1315 400 4289 14434 3592 416 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The invention of the printing press by Johannes Gutenberg in the 15th century transformed society by |
|
generating: |
|
--- |
|
creating a social enterprise with the potential to modernize our natural resources. |
|
The times during this era are a mixture of the growing influence of the revolutionary course drawn in the extensive changes the world has undergone since its emphasis on the capitalist market made possible the growth of the flourishing style and society, and provides for intrinsic growth for retaining ethical and social engines, characterized in its turn by the highest social goals at all levels. The Glass Mint is an instance of worthwhile things, in is not as orderly and diverse as key giants like Apple Inc., Amazon, Google, Google Groves, General Electric, Uber, BSkyB, Amazon, Google.<|endoftext|>anna ryderre, update 4 |
|
1 December 2020 |
|
Oops! Please note that The �Succeses de Cookies�: Coin Account Platform has not been accredited to confirm your level of privacy. |
|
2 December 2020 |
|
Hi guys. This is Aaron Chatieri�s testimonial for The Price at the End of COVID-19. Thank you. |
|
Well, I�m sorry, as I don�t think you really expect to come back out and say �that was just creepy� |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 19.16 ms | 13.8% bf16 MFU | 53446 tok/s |
|
val loss 3.155447 |
|
prompt_length: 27 |
|
gen_tokens: 464 5103 286 262 9485 43591 286 402 23638 11 543 2540 1088 1679 1795 11843 11 3793 257 10715 2233 284 262 6156 7605 973 284 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The construction of the Pyramids of Giza, which began around 2580 BC, remains a mystery due to the ancient techniques used to |
|
generating: |
|
--- |
|
deal with the volcanic eruptions. The pyramids have been interpreted as a science of meteorology because of the eruptions, which may cover the remains of Persepolis and A.D. 64 departs from Sadrati in ancient Turkey. |
|
Janu Kahli saw the time that brains become flushed from windows marked astro-physes important. She was struck by the idea that Zen Buddhas are pulled back through the rain and have breathed in the air. It was HER last hope that visitors should stop by Tatufurei and have a horse out. "I do not like to spoil my time," she said.<|endoftext|>Despite the fact that the LMP can only be downloaded via KClips, there are still some other dependencies that you�re prone to having. Because we didn�t realise that Leamington fully supports them, an ongoing study was thrown into our aware and enlightening fundamentals course. |
|
Malware 101 is part of a growing security architecture delivered from the now nightly SysGate edition. The two Murphy 101 modules, "night tracking" and "social cybercrime," are currently active |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 18.00 ms | 14.7% bf16 MFU | 56888 tok/s |
|
val loss 3.155447 |
|
prompt_length: 13 |
|
gen_tokens: 464 640 340 1718 284 1382 262 412 733 417 8765 373 220 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The time it took to build the Eiffel Tower was |
|
generating: |
|
--- |
|
____. |
|
Less than a year before I finally sold it and moved to South Australia I also purchased a St Tenant. The building works are complete, undergo a lot of changes, and there are lots of photos already. However, there are more still left.<|endoftext|>Hello boys and gents of helping Buddy here from Corona Vintage! |
|
Have a wicked year and be ready on day 3 of DH with some new toys for Leo a… some new PJs, a new favorite boardgame, and our newest conquerress |
|
Lots of traffic to your spie page and an update to the wonderful all-photographer! I�m rewriting all of the blog images, post editing, and adjusting pictures in the coming days to give you the most bang for your buck! |
|
wow! one a dramatic opening photo for a YWC stumper E! thanks for looking! |
|
Thanks for being here from Corona Vintage! Please join my Blast school, Nova Model Project, our China & Korean movies editor Luke Davis, and Queens of the Galaxy citizen who are all keen on challenging the US government on the globalisation of Chinese factories in the mid East and claim that small villages are the money cakes of China. It� |
|
--- |
|
total average iteration time: -nan ms |
|
| The ancient manuscript was hidden deep within the library's restricted section. When Sarah finally found it, she couldn't believe her eyes. The text revealed that | the word "Palestinian" was in her family name. | |
|
| While excavating an ancient tomb in Egypt, archaeologist Dr. Sarah Mitchell uncovered a hidden chamber that contained a scroll revealing | the story of Moses' family and who lost their lives | |
|
| The largest desert in the world is the... | front of the Milky Way, and it's getting worse. | |
|
| My grandmother used to tell me stories about the old days when we would sit by the... | fire in the barn with the local kids and stories about spies on the bush. | |
|
| The GitHub project llm.c is a... | project of The Leapfrog Group, which was founded in October 2003 to develop and develop hyper-centralized, distributed software. | |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 19.25 ms | 13.7% bf16 MFU | 53207 tok/s |
|
val loss 3.155447 |
|
prompt_length: 10 |
|
gen_tokens: 464 1772 286 262 1492 12844 373 3194 416 220 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: The author of the book 1984 was written by |
|
generating: |
|
--- |
|
ianidorepalambly/18551940633 of the author. Little was written by ianidorepalambly/18551350 of the author, and one her dad . We recommend to avoid these errors with the book if you dare to imagine them, as those words ain�t poetry, and try to charm ianidorepalambly/18551350 years past those words. , and Judy M. is the author of Hunger and Cracker Mickey , now with Cotillion Weivey, and Jane Austen: A Book in the Attraction Store, 2015. ianidinearm |
|
I am a CPA with more than 51 years of experience in the insurance industry. I have enjoyed working for a long time knowing the ins-and-outs of the law myself, because it is health related. |
|
CPA is a fraction of other insurance companies�Insurance companies, due to their lower monthly premiums, less effective administrative oversight and more doable de-risk management. |
|
CPA comes from a number of steps, which makes it difficult for those with experience and technical knowledge to implement insurance programs. The abbreviation �CPA IS� |
|
--- |
|
total average iteration time: -nan ms |
|
Multi-GPU support is disabled. Using a single GPU. |
|
+-----------------------+----------------------------------------------------+ |
|
| Parameter | Value | |
|
+-----------------------+----------------------------------------------------+ |
|
| train data pattern | dev/data/fineweb10B/fineweb_train_*.bin | |
|
| val data pattern | dev/data/fineweb10B/fineweb_val_*.bin | |
|
| output log dir | NULL | |
|
| checkpoint_every | 0 | |
|
| resume | 0 | |
|
| micro batch size B | 1 | |
|
| sequence length T | 1024 | |
|
| total batch size | 1024 | |
|
| LR scheduler | cosine | |
|
| learning rate (LR) | 0.000000e+00 | |
|
| warmup iterations | 0 | |
|
| final LR fraction | 1.000000e+00 | |
|
| weight decay | 0.000000e+00 | |
|
| skip update lossz | 0.000000 | |
|
| skip update gradz | 0.000000 | |
|
| max_steps | 1 | |
|
| val_loss_every | 20 | |
|
| val_max_steps | 20 | |
|
| sample_every | 1 | |
|
| genT | 256 | |
|
| overfit_single_batch | 0 | |
|
| use_master_weights | enabled | |
|
| gelu_fusion | 0 | |
|
| recompute | 1 | |
|
+-----------------------+----------------------------------------------------+ |
|
| device | NVIDIA A100-SXM4-40GB | |
|
| peak TFlops | 312.0 | |
|
| precision | BF16 | |
|
+-----------------------+----------------------------------------------------+ |
|
| weight init method | log124M/model_00015000.bin | |
|
| max_sequence_length T | 1024 | |
|
| vocab_size V | 50257 | |
|
| padded_vocab_size Vp | 50304 | |
|
| num_layers L | 12 | |
|
| num_heads NH | 12 | |
|
| channels C | 768 | |
|
| num_parameters | 124475904 | |
|
+-----------------------+----------------------------------------------------+ |
|
| train_num_batches | 1 | |
|
| val_num_batches | 20 | |
|
+-----------------------+----------------------------------------------------+ |
|
| run hellaswag | no | |
|
+-----------------------+----------------------------------------------------+ |
|
| Zero Optimization is disabled | |
|
| num_processes | 1 | |
|
| zero_stage | 0 | |
|
+-----------------------+----------------------------------------------------+ |
|
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation |
|
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`. |
|
num_parameters: 124475904 => bytes: 248951808 |
|
allocated 237 MiB for model parameters |
|
batch_size B=1 * seq_len T=1024 * num_processes=1 and total_batch_size=1024 |
|
=> setting grad_accum_steps=1 |
|
allocating 237 MiB for parameter gradients |
|
allocating 618 MiB for activations |
|
allocating 474 MiB for AdamW optimizer state m |
|
allocating 474 MiB for AdamW optimizer state v |
|
allocating 474 MiB for master copy of params |
|
device memory usage: 2983 MiB / 40326 MiB |
|
memory per sequence: 618 MiB |
|
-> estimated maximum batch size: 61 |
|
val loss 3.155447 |
|
step 1/1 | loss 3.382539 (+nanz)| norm 4.1926 (+nanz)| lr 0.00e+00 | 17.24 ms | 15.3% bf16 MFU | 59390 tok/s |
|
val loss 3.155447 |
|
prompt_length: 8 |
|
gen_tokens: 11964 13 448 13 35235 7203 15496 198 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 50256 |
|
Prompt: System.out.println("Hello |
|
|
|
generating: |
|
--- |
|
x> 4.")".ackivent of working with his kitchen but he has to remove the appliances to the oven to be able to separate the formula. Hitchhiker lost as a result of his elimination with his minerals. Rest were obtained from the addition of the poisons from onto the mains oven as homage to chapter 231 Diego. |
|
Title: Restoration: Chapter 900 |
|
Author: Toyō Kenji |
|
Licensed By: Joji Husha |
|
The cost of an enspnter's roof repair price will be delineated by the chargpter."The so-called shinerd laird like a bat and that is priceless."The shinerd-helmed 'batteryMax - an oddity - did have a jinx on the shinerd boiler-of-modern history," went a notarialist code "Berenished by a thrawny useofwaterweave.Will thou repent?" "Dunnernail that here on the dial I literatinativea-agent, of course soo anis μimbuzz. |
|
The powers of enflamed primekmaken the universe they lived in and turned love into a bed hobby - not one of anticipated mere chubb and |
|
--- |
|
total average iteration time: -nan ms |
|
|
|
|
|
|
|
|
|
|
|
|
|
Aidan Do's note: The above inference was by the model_00015000.bin of the 1x_A100_40GB training run |
|
|