merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the passthrough merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: ClaudioItaly/Underground
layer_range: [0, 2]
parameters:
attention:
- filter: o_proj
value: 1.15
- filter: q_proj
value: 1.15
- filter: v_proj
value: 1.1
- filter: down_proj
value: 1.0
- sources:
- model: ClaudioItaly/Underground
layer_range: [2, 6]
parameters:
attention:
- filter: o_proj
value: 1.3
- filter: q_proj
value: 1.25
- filter: v_proj
value: 1.2
- filter: down_proj
value: 1.15
- sources:
- model: ClaudioItaly/Underground
layer_range: [6, 12]
parameters:
attention:
- filter: o_proj
value: 1.5
- filter: q_proj
value: 1.4
- filter: v_proj
value: 1.35
- filter: down_proj
value: 1.3
- sources:
- model: ClaudioItaly/Underground
layer_range: [12, 20]
parameters:
attention:
- filter: o_proj
value: 1.75
- filter: q_proj
value: 1.6
- filter: v_proj
value: 1.5
- filter: down_proj
value: 1.4
- sources:
- model: ClaudioItaly/Underground
layer_range: [20, 28]
parameters:
attention:
- filter: o_proj
value: 2.0
- filter: q_proj
value: 1.85
- filter: v_proj
value: 1.75
- filter: down_proj
value: 1.6
- sources:
- model: ClaudioItaly/Underground
layer_range: [28, 36]
parameters:
attention:
- filter: o_proj
value: 2.2
- filter: q_proj
value: 2.0
- filter: v_proj
value: 1.9
- filter: down_proj
value: 1.8
- sources:
- model: ClaudioItaly/Underground
layer_range: [36, 42]
parameters:
attention:
- filter: o_proj
value: 2.5
- filter: q_proj
value: 2.3
- filter: v_proj
value: 2.2
- filter: down_proj
value: 2.0
- sources:
- model: ClaudioItaly/Underground
layer_range: [42, 48]
parameters:
attention:
- filter: o_proj
value: 3.0
- filter: q_proj
value: 2.7
- filter: v_proj
value: 2.5
- filter: down_proj
value: 2.3
merge_method: passthrough
dtype: bfloat16
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.