Model Name SmolLM2-135M

Model Description

SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
It is trained on the Cosmopedia-2 dataset.
Purpose of this model is to train SmolLm2 Transformer model from scratch, I trained for 10 hours using g5.2xlarge instance (24 A10 single GPU)
trained steps 70000 (Batch config : Batch size 16, with 1024 context length)

Base Tokenizer

Usage Example

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from SmolLm3 import LlamaModel
import yaml
# Download the model file
model_path = hf_hub_download(
    repo_id="crpatel/SmolLM2-135M-cosmopedia2-70kSteps",
    filename="model.pt"
)

config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)
model = LlamaModel(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
# cpu = torch.device('cpu')
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)
generated_text2=model.generate(idx=encoded_text, max_new_tokens=100, context_length=50, 
                               temperature=0.9,
                                 top_k=2, eos_token=tokenizer.eos_token_id, 
                                 device='cpu')
decoded_text2=tokenizer.decode(generated_text2.squeeze(0))
print(decoded_text2)

crpatel
/

SmolLM2-135M-cosmopedia2-70kSteps

Model Name SmolLM2-135M

Model Description

Base Tokenizer

Usage Example

Model tree for crpatel/SmolLM2-135M-cosmopedia2-70kSteps

Dataset used to train crpatel/SmolLM2-135M-cosmopedia2-70kSteps