Model Name SmolLM2-135M
Model Description
- SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
- It is trained on the Cosmopedia-2 dataset.
- Purpose of this model is to train SmolLm2 Transformer model from scratch, I trained for 10 hours using g5.2xlarge instance (24 A10 single GPU)
- trained steps 70000 (Batch config : Batch size 16, with 1024 context length)
Base Tokenizer
Usage Example
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from SmolLm3 import LlamaModel
import yaml
# Download the model file
model_path = hf_hub_download(
repo_id="crpatel/SmolLM2-135M-cosmopedia2-70kSteps",
filename="model.pt"
)
config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)
model = LlamaModel(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
# cpu = torch.device('cpu')
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)
generated_text2=model.generate(idx=encoded_text, max_new_tokens=100, context_length=50,
temperature=0.9,
top_k=2, eos_token=tokenizer.eos_token_id,
device='cpu')
decoded_text2=tokenizer.decode(generated_text2.squeeze(0))
print(decoded_text2)
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for crpatel/SmolLM2-135M-cosmopedia2-70kSteps
Base model
HuggingFaceTB/SmolLM2-135M