--- base_model: OuteAI/Lite-Mistral-150M-v2-Instruct tags: - SPPO - alignment-handbook - generated_from_trainer datasets: - UCLA-AGI/data-mistral-7b-instruct-sppo-iter1 - christopherthompson81/sppo-synthetic-dataset-lite-mistral-150m-v2 model-index: - name: Lite-Mistral-150M-v2-Instruct-SPPO-Iter3 results: [] license: apache-2.0 --- # Lite-Mistral-150M-v2-Instruct-SPPO-Iter3 This model is iteration 3 of an SPPO process applied to [OuteAI/Lite-Mistral-150M-v2-Instruct](https://huggingface.co/OuteAI/Lite-Mistral-150M-v2-Instruct) by generating synthetic datasets from the prompts of [UCLA-AGI/data-mistral-7b-instruct-sppo-iter1](https://huggingface.co/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1). One of the notable learnings I had was that the prompts used for an SPPO process should only be slightly beyond the capabilities of the base model. If they're too difficult, none of the synthetic outputs will be meaningfully "better" for the autoranker to prefer over the others. Additionally, the autoranker only needs to be good enough to evaluate the target prompts, but will need to be adjudicated for the prompts to know if it is satisfactory. ## Model description I made this model to practice with the SPPO method. The model selection was based on it being small and therefore fast, as well as being coherent. ## Intended uses & limitations 150M models are generally meant for a data scientist or technological enthusiast to familiarize themselves with a topic. This model is likely best used as a means of gaining an understanding of the results that SPPO can deliver by contrasting outputs between the base model and prior iterations of the SPPO process. Viable output is just a bonus. ## Training procedure I'm still working on putting my code together in a releasable way. The code will eventually be accessible here: * [GitHub - SPPO Generator](https://github.com/christopherthompson81/sppo_generator) ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 1 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1.0 ### Training results Still to come. Have not run evals yet... ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1