π¦β¨ BigLlama-3.1-1T-Instruct
This is an experimental self-merge using meta-llama/Meta-Llama-3.1-405B-Instruct and created with mergekit.
This is the direct successor of Meta-Llama-3-120B-Instruct, a self-merge of Llama 3 70B that produced a decent 120B model for tasks like creative writing.
I tweaked the range of duplicated layers to hopefully make a sensible model. Use it at your own risk!
π Applications
I recommend using this model for creative writing with the Llama 3 chat template.
β‘ Quantization
TBD.
π Evaluation
TBD.
𧩠Configuration
This model was merged using the passthrough merge method. The following YAML configuration was used to produce this model:
slices:
- sources:
- layer_range: [0, 105]
model: mlabonne/BigLlama-3.1-681B-Instruct
- sources:
- layer_range: [52, 157]
model: mlabonne/BigLlama-3.1-681B-Instruct
- sources:
- layer_range: [104, 209]
model: mlabonne/BigLlama-3.1-681B-Instruct
merge_method: passthrough
dtype: bfloat16
Here is the code I've used to generate the config and calculate the number of layers/parameters after passthrough:
def generate_yaml_config(range_size, total_layers, nb_parameters):
new_size = total_layers + total_layers - range_size
new_param = (nb_parameters / total_layers) * new_size
print(f"New size = {new_size} layers")
print(f"New parameters = {new_param:.2f}B")
yaml_str = "slices:\n"
for i in range(0, round(total_layers - range_size + 1), range_size // 2):
start = i
end = min(start + range_size, total_layers)
yaml_str += f"- sources:\n"
yaml_str += f" - layer_range: [{start}, {end}]\n"
yaml_str += f" model: meta-llama/Meta-Llama-3.1-405B-Instruct\n"
yaml_str += "merge_method: passthrough\n"
yaml_str += "dtype: bfloat16\n"
print(yaml_str)
return new_size, new_param
# Example usage
new_size, new_param = generate_yaml_config(42, 126, 410)
new_size, new_param = generate_yaml_config(105, new_size, new_param)
- Downloads last month
- 275
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.