MIST-1-140B

MIST-1-140B is the largest model in the MIST model family by olaverse. Created by stacking MIST-1-70B with itself using the Frankenmerge technique — doubling the depth to 158 layers and ~140B parameters.

MIST Model Family

Model Params Speed Status
MIST-1-8B 8B ~63 tok/s ✅ Available
MIST-1-70B 70B ~23 tok/s ✅ Available
MIST-1-140B 140B ~8 tok/s ✅ Available

Key Strengths

  • 🧠 Deepest Reasoning — 158 layers of processing
  • 💡 Rich Explanations — more detailed and engaging responses
  • 💻 Excellent Coding — thorough documentation and examples
  • 📐 Precise Math — detailed step-by-step solutions
  • 🔓 Unrestricted — follows all instructions
  • 📚 128K Context — long document processing

How It Was Built

MIST-1-70B (80 layers — DARE+TIES of 4 best 70B models) ↓ Frankenmerge technique MIST-1-140B (158 layers — ~140B parameters)

Inspired by Samsung's Solar 10.7B which used the same layer stacking technique to beat models twice its size.

Benchmark Results

Task Precision Speed Quality
Reasoning bfloat16 32s ✅ Detailed and conversational
Coding bfloat16 32s ✅ Well documented with docstrings
Math bfloat16 32s ✅ Clear step-by-step
General bfloat16 32s ✅ Rich and engaging
Reasoning 4-bit ~35s ✅ Slightly slower, similar quality

Average: 8 tok/s (bfloat16) — fits on 2x H200

How to Use

bfloat16 — Full Precision (280GB VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-1-140B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("olaverse/MIST-1-140B")

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

4-bit Quantized (70GB VRAM — fits on single H200)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type='nf4'
)
model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-1-140B",
    quantization_config=quantization_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("olaverse/MIST-1-140B")

Hardware Requirements

Precision VRAM Size
bfloat16 280GB (2x H200) 256GB
4-bit (NF4) ~70GB (1x H200/H100) 69GB

License

Llama 3.1 Community License

Downloads last month
53
Safetensors
Model size
137B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for olaverse/MIST-1-140B

Finetuned
(1)
this model
Quantizations
1 model

Collection including olaverse/MIST-1-140B