8.png

Blaze.1-27B-Reflection is a Gemma 2-based 27B parameter model. Gemma is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology behind the Gemini models. These models are text-to-text, decoder-only large language models available in English, with open weights for both pre-trained and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Blaze.1-27B-Reflection is fine-tuned on self-reflection and behavioral data, using synthetic datasets for long-chain-of-thought reasoning from models such as DeepSeek and QwQ.

Quickstart Chat Template

Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:

pip install -U transformers

Then, copy the snippet from the section that is relevant for your usecase.

Running with the pipeline API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="prithivMLmods/Blaze.1-27B-Reflection",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world.  So, what be yer pleasure, eh? 🦜

Running the model on a single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Blaze.1-27B-Reflection")
model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/Blaze.1-27B-Reflection",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))

You can ensure the correct chat template is applied by using tokenizer.apply_chat_template as follows:

messages = [
    {"role": "user", "content": "Write me a poem about Machine Learning."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using different precisions

The native weights of this model were exported in bfloat16 precision.

You can also use float32 if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to float32). See examples below.

  • Upcasting to torch.float32
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Blaze.1-27B-Reflection")
model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/Blaze.1-27B-Reflection",
    device_map="auto",
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))

Intended Use

Blaze.1-27B-Reflection is designed for advanced reasoning tasks that require long-chain-of-thought processing, self-reflection, and behavioral analysis. Its primary applications include:

  1. Question Answering: The model excels in providing detailed, step-by-step answers to complex queries.
  2. Summarization: It can generate concise summaries of large text inputs, maintaining key information and logical flow.
  3. Reasoning and Decision Support: With its fine-tuning on self-reflection data, it can assist in tasks that require thoughtful analysis, such as legal reasoning, policy development, and strategic planning.
  4. Conversational AI: Due to its instruction-tuned nature, it performs well in interactive dialogue systems, offering coherent and context-aware responses.
  5. Creative Writing: The model can be employed in generating high-quality content for creative tasks, including storytelling and content ideation.

Limitations

  1. Language and Domain Constraints: While the model is effective in English, it may perform poorly with non-English inputs or domain-specific jargon outside its training scope.
  2. Context Retention Issues: In very long conversations or documents, the model may lose track of earlier context, leading to incomplete or off-topic responses.
  3. Over-reliance on Synthetic Data: Since Blaze.1-27B-Reflection is fine-tuned on synthetic datasets, it may exhibit biases or inconsistencies when faced with real-world, nuanced scenarios.
  4. Circular Reasoning: The model may occasionally enter recursive reasoning loops, generating verbose responses without reaching a clear conclusion.
  5. Computational Demand: As a 27B parameter model, it requires substantial computational resources for both inference and fine-tuning, which may limit its accessibility for users with limited hardware.
  6. Hallucinations: Like most large language models, it may confidently generate incorrect information, especially when asked about facts or events outside its training data.
Downloads last month
50
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for prithivMLmods/Blaze.1-27B-Reflection

Base model

google/gemma-2-27b
Finetuned
(24)
this model
Quantizations
2 models

Collection including prithivMLmods/Blaze.1-27B-Reflection