--- base_model: - HuggingFaceTB/cosmo-1b - Lambent/cosmo-1b-galore-pythontest - Lambent/cosmo-1b-qlora-pythontest - Lambent/cosmo-1b-lisa-pythontest library_name: transformers tags: - mergekit - merge --- # pythontestmerge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). Testing training data validation: * Model Stock 3/4 Loss: 0.451 My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate. Cosmopedia data validation: * Model Stock 3/4 Loss: 1.021 On the other hand, it indeed may have pulled it towards forgetfulness. This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods. I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset. Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would. ## Merge Details ### Merge Method This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [HuggingFaceTB/cosmo-1b](https://huggingface.co/HuggingFaceTB/cosmo-1b) as a base. ### Models Merged The following models were included in the merge: * [Lambent/cosmo-1b-galore-pythontest](https://huggingface.co/Lambent/cosmo-1b-galore-pythontest) * [Lambent/cosmo-1b-qlora-pythontest](https://huggingface.co/Lambent/cosmo-1b-qlora-pythontest) * [Lambent/cosmo-1b-lisa-pythontest](https://huggingface.co/Lambent/cosmo-1b-lisa-pythontest) ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: Lambent/cosmo-1b-lisa-pythontest - model: Lambent/cosmo-1b-qlora-pythontest - model: Lambent/cosmo-1b-galore-pythontest base_model: HuggingFaceTB/cosmo-1b merge_method: model_stock parameters: filter_wise: false dtype: float16 ```