Spaces:

nanochat-students
/

README

Running

App Files Files Community

🥦 Pretraining Thread

by burtenshaw - opened 20 days ago

Discussion

burtenshaw

nanochat students org 20 days ago

•

edited 20 days ago

2. Pre-training

To pretrain, we need to download a larger slice of the data:

python -m nanochat.dataset -n 240 &

We can then run training like so and integrate trackio:

export TRACKIO_SPACE_ID="nanochat-students/trackio"
export TRACKIO_PROJECT="nanochat-pretraining"
export TRACKIO_DATASET_ID="nanochat-students/trackio-dataset"
export HF_TOKEN="<your-token>"
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- --depth=20

This will start logging to the cli like so:

step 14218/21400 (66.44%) | loss: 2.885312 | lrm: 1.00 | dt: 478.80ms | tok/sec: 1,095,011 | mfu: 48.33 | total time: 114.67m
step 14219/21400 (66.44%) | loss: 2.874319 | lrm: 1.00 | dt: 479.48ms | tok/sec: 1,093,459 | mfu: 48.26 | total time: 114.68m
step 14220/21400 (66.45%) | loss: 2.880379 | lrm: 1.00 | dt: 478.54ms | tok/sec: 1,095,590 | mfu: 48.35 | total time: 114.69m

And report metrics to the shared trackio space: https://nanochat-students-trackio.hf.space"

... pretraining is still running. So I'll report back.

burtenshaw

nanochat students org 19 days ago

The weights from pre-training are here: https://huggingface.co/nanochat-students/base-d20

burtenshaw

nanochat students org 19 days ago

The hardest part of this was porting the custom inference code to transformers. But it was fun!

from transformers import AutoConfig, AutoModel, AutoTokenizer
import torch

model_dir = "nanochat-students/base-d20"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True)
model = model.to(device)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

prompt = "The capital of Belgium is "
input_ids = tokenizer.encode(prompt, prepend=tokenizer.get_bos_token_id())
ids = torch.tensor([input_ids], dtype=torch.long, device=device)

max_new_tokens = 50
with torch.inference_mode():
    for _ in range(max_new_tokens):
        outputs = model(input_ids=ids)
        logits = outputs["logits"] if isinstance(outputs, dict) else outputs.logits
        next_token = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True)
        ids = torch.cat([ids, next_token], dim=1)

decoded = tokenizer.decode(ids[0].tolist())
print(decoded)

richardprobe

nanochat students org 14 days ago

I wonder if you have tried running the pretraining on 8 A100s instead? Lambda ran out of 8 H100s for now.

stefan-it

10 days ago

Hey @burtenshaw ,

thanks for providing the trackio logs! I've just noticed, that the logs end at 860 steps - whereas the original training should be for 21,400 steps, so I am a bit confused 🤔

stefan-it

9 days ago

@richardprobe I have trained a German nanochat base model on 8x A100s - no problem, it is definitely working on them!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment