IridiumLlama-72B-v0.1

Model Description

IridiumLlama is a 72B parameter language model created through a merge of Qwen2-72B-Instruct, calme2.1-72b, and magnum-72b-v1 using model_stock.

This is converted from leafspark/Iridium-72B-v0.1 (currently private)

Features

  • 72 billion parameters
  • Sharded in 31 files (unlike Iridium, which has 963 shards due to the merging process)
  • Combines Magnum prose with Calam smarts
  • Llamaified for easy use

Technical Specifications

Architecture

  • LlamaForCasualLM
  • Models: Qwen2-72B-Instruct (base), calme2.1-72b, magnum-72b-v1
  • Merged layers: 80
  • Total tensors: 1,043
  • Context length: 32k

Tensor Distribution

  • Attention layers: 560 files
  • MLP layers: 240 files
  • Layer norms: 160 files
  • Miscellaneous (embeddings, output): 162 files

Merging

Custom script utilizing safetensors library.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("leafspark/IridiumLlama-72B-v0.1", 
                                             device_map="auto", 
                                             torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("leafspark/IridiumLlama-72B-v0.1")

GGUFs

Find them here: mradermacher/IridiumLlama-72B-v0.1-GGUF

Hardware Requirements

  • At least ~150GB of free space
  • ~150GB VRAM
Downloads last month
6
Safetensors
Model size
72.7B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for leafspark/IridiumLlama-72B-v0.1

Quantizations
2 models

Collection including leafspark/IridiumLlama-72B-v0.1