Edit model card

pythontestmerge

This is a merge of pre-trained language models created using mergekit.

Testing training data validation:

  • Model Stock 3/4 Loss: 0.451

My hypothesis that the pretraining was dragging down the stock merge's performance on training data in any way seems inaccurate.

Cosmopedia data validation:

  • Model Stock 3/4 Loss: 1.021

On the other hand, it indeed may have pulled it towards forgetfulness. This is a better loss vs catastrophic forgetting than the prior Model Stock or any of the training methods.

I'm going to estimate that using the base model as an anchor point is a strong remedy for catastrophic forgetting when using multiple different training methods on the same dataset. Less sure I can say anything about how it affects adaptation to the new dataset. It's possible that if using this method, you'd want louder/stronger adaptation to start with than you otherwise would.

Merge Details

Merge Method

This model was merged using the Model Stock merge method using HuggingFaceTB/cosmo-1b as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Lambent/cosmo-1b-lisa-pythontest
  - model: Lambent/cosmo-1b-qlora-pythontest
  - model: Lambent/cosmo-1b-galore-pythontest
base_model: HuggingFaceTB/cosmo-1b
merge_method: model_stock
parameters:
  filter_wise: false
dtype: float16
Downloads last month
3
Safetensors
Model size
1.74B params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of