tangled-llama-k-128k-v0.1
Train Tokenizer
python -B train_tokenizer.py
Tokenizer training log:
Resolving data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 132/132 [00:00<00:00, 266.56it/s]
Loading dataset shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18/18 [00:05<00:00, 3.24it/s]
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 133/133 [00:00<00:00, 306844.02it/s]
[00:21:52] Pre-processing sequences ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 0 / 0
[00:00:48] Tokenize words ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 25635525 / 25635525
[00:01:17] Count pairs ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 25635525 / 25635525
[00:06:07] Compute merges ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 32066 / 32066
Pretrain
python -B prepare_pretrain_dataset.py
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-model.yaml
Chat with Pretrained model
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES="0" litgpt chat out/pretrain/final/