--- library_name: nanotron --- # ⚙️ Nano-Mistral Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/) Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1 ## 🚀 Quickstart ```bash # Generate a config file python config_tiny_mistral.py # Run training export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml ``` ## 🚀 Run generation with pretrained Mistral-7B-0.1 ```bash export CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1 ``` ## 🚀 Use your custom model - Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration - Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture - Pass the previous to the `DistributedTrainer` class in `run_train.py`: ```python trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig) ``` - Run training as usual