Instructions to use alphabot2/alphabot_smolvla_1st_edition with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use alphabot2/alphabot_smolvla_1st_edition with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=alphabot2/alphabot_smolvla_1st_edition \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=alphabot2/alphabot_smolvla_1st_edition - Notebooks
- Google Colab
- Kaggle
SmolVLA Model - First Edition (alphabot_smolvla_1st_edition)
A fine-tuned Vision-Language-Action (VLA) model trained with LeRobot for robot control tasks.
Model Description
This model is a specialized fine-tuning of SmolVLA on the alphabot2 robot dataset, enabling the model to understand and execute robot control tasks through vision and language understanding.
Model Architecture
- Base Architecture: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
- Type: Vision-Language-Action (VLA) Policy
- Total Parameters: 450M
- Learnable Parameters: 100M
- Framework: LeRobot
Training Details
Dataset
- Dataset ID:
alphabot2/ai2robot_full_900_episodes_10fps_JC_intern_matched_from_full - Total Frames: 163,545
- Total Episodes: 831
- Video FPS: 10 fps
- Chunk Size: 50 frames
- Action Horizon: 50 steps
Training Configuration
- Training Steps: 100,000
- Batch Size: 8 (optimized for RTX 3070 8GB GPU)
- Optimizer: AdamW
- Learning Rate: 1e-4
- Scheduler: Cosine decay with warmup
- Mixed Precision: AMP (Automatic Mixed Precision) enabled
- Training Time: ~48 hours on NVIDIA RTX 3070 Laptop
- Grad Clip Norm: 1.0
Preprocessing
- Video Backend: torchcodec with pyav fallback
- Input Normalization: Running statistics computed from training dataset
- Output Denormalization: Inverse normalization of training action statistics
- Resolution: 224x224 (standard for SmolVLM)
Usage
Loading the Model
from lerobot.policies.pretrained import PreTrainedPolicy
import torch
# Load the model from HuggingFace Hub
policy = PreTrainedPolicy.from_pretrained(
"alphabot2/alphabot_smolvla_1st_edition"
)
# Set to evaluation mode (important!)
policy.eval()
# Run inference (assumes observation dict with images, proprio, etc.)
with torch.no_grad():
action = policy.select_action(observation_dict)
Inference Requirements
- PyTorch with CUDA support (GPU recommended)
- LeRobot library
- torchcodec for video processing
- ~2GB GPU VRAM minimum for inference
- Input observation must include visual data and proprioceptive state
Model Card Details
Intended Use
- Primary: Robot control and imitation learning via vision and language
- Supported Tasks: Robot manipulation tasks in the alphabot environment
- Training Data: Demonstrations collected from human operators
Limitations
- Model is specialized for the alphabot2 robot platform
- Performance on out-of-distribution scenarios may be limited
- Requires proper observation preprocessing (normalization, etc.)
Ethical Considerations
- This is an imitation learning model trained on human demonstrations
- Use only in controlled research/educational environments
- Not intended for autonomous systems without human oversight
- Ensure compliance with local regulations for robot operation
Training Procedure
The model was trained using LeRobot's standard training pipeline:
- Data Loading: Video frames processed with torchcodec backend
- Error Handling: Corrupted samples automatically skipped during training
- Batch Processing: 8 samples per batch with gradient accumulation
- Loss Function: Standard policy gradient loss
- Evaluation: Periodic evaluation every 1,000 steps
- Checkpointing: Saved every 5,000 steps
All training artifacts include proper preprocessor/postprocessor configurations for handling input normalization and output denormalization.
Hardware Requirements
For Inference
- Minimum GPU: 2GB VRAM (e.g., RTX 2060)
- Recommended GPU: 4GB+ VRAM (e.g., RTX 3060 or better)
- CPU: Modern Intel/AMD processor (4+ cores recommended)
- RAM: 8GB minimum
For Fine-tuning
- Recommended GPU: 12GB+ VRAM (e.g., RTX 3090, RTX 4090, A100)
- GPU Memory: Larger batch sizes require proportionally more VRAM
- Storage: ~10GB for dataset + checkpoint files
Files in Repository
model.safetensors- Trained model weights (1.2GB)config.json- Model architecture configurationpolicy_preprocessor.json- Input preprocessing configurationpolicy_postprocessor.json- Output postprocessing configurationpolicy_preprocessor_step_5_normalizer_processor.safetensors- Normalizer statepolicy_postprocessor_step_0_unnormalizer_processor.safetensors- Denormalizer statetrain_config.json- Training configuration metadataREADME.md- This file
License
Apache License 2.0 - See LICENSE file for details
Citation
If you use this model, please cite:
@software{lerobot,
title={LeRobot: An Open-Source Platform for Robotics Imitation Learning},
author={Zambaldi, Victor and others},
url={https://github.com/huggingface/lerobot},
year={2024}
}
@article{navidi2024smolvla,
title={SmolVLA: A Compact Vision-Language-Action Model for Robotics},
author={Navidi, H. and others},
journal={arXiv preprint arXiv:2405.14850},
year={2024}
}
Support & Questions
For issues or questions about:
- This Model: Check the LeRobot documentation
- LeRobot Framework: Visit GitHub repository
- HuggingFace Hub: See hub documentation
Disclaimer
This model is provided as-is. Users are responsible for ensuring its safe and appropriate use. The authors are not liable for any misuse or damage caused by this model.
Model Card Last Updated: 2026-06-15 Training Completed: 2026-06-13 Total Training Steps: 100,000
- Downloads last month
- 15