mistral-nanotron / README.md
nouamanetazi's picture
nouamanetazi HF staff
Upload folder using huggingface_hub
5d8e8eb verified
metadata
library_name: nanotron

βš™οΈ Nano-Mistral

Modeling code for Mistral to use with Nanotron

πŸš€ Quickstart

# Generate a config file
python config_tiny_mistral.py

# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml

πŸš€ Use your custom model

  • Update the MistralConfig class in config_tiny_mistral.py to match your model's configuration
  • Update the MistralForTraining class in modeling_mistral.py to match your model's architecture
  • Pass the previous to the DistributedTrainer class in run_train.py:
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
  • Run training as usual