pythontestmerge

This is a merge of pre-trained language models created using mergekit.

Testing training data validation:

Model Stock 3/4 Loss: 0.451

My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.

Cosmopedia data validation:

Model Stock 3/4 Loss: 1.021

On the other hand, it indeed may have pulled it towards forgetfulness. This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods.

I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset. Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would.

Merge Details

Merge Method

This model was merged using the Model Stock merge method using HuggingFaceTB/cosmo-1b as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Lambent/cosmo-1b-lisa-pythontest
  - model: Lambent/cosmo-1b-qlora-pythontest
  - model: Lambent/cosmo-1b-galore-pythontest
base_model: HuggingFaceTB/cosmo-1b
merge_method: model_stock
parameters:
  filter_wise: false
dtype: float16

Lambent
/

cosmo-1b-stock-pythontest-0.1

pythontestmerge

Merge Details

Merge Method

Models Merged

Configuration

Merge of

pythontestmerge

Merge Details

Merge Method

Models Merged

Configuration

Merge of HuggingFaceTB/cosmo-1b Lambent/cosmo-1b-galore-pythontest Lambent/cosmo-1b-qlora-pythontest Lambent/cosmo-1b-lisa-pythontest

Merge of