Qwen3-1.7B-ReLU

This repository contains a ReLU-fied variant of Qwen3 1.7B released as part of the ReLU-Tune project.

The model was created by replacing the transformer MLP activation with ReLU across all decoder layers and then recovering performance through staged LoRA fine-tuning. The weights in this repository are provided as a merged checkpoint for straightforward loading and evaluation.

What This Model Is

  • Base model: Qwen/Qwen3-1.7B
  • Architecture change: transformer MLP activations changed to ReLU
  • Modified layers: all decoder MLP activation sites
  • Training method: staged LoRA fine-tuning
  • Training schedule: 2 stages, 900 steps each, 1800 total steps
  • Training data: continued pretraining on tiiuae/falcon-refinedweb
  • Model format: merged dense model
  • Extra metadata: activation_config.json documents the activation modification

Intended Use

This model is intended for:

  • research on activation functions in large language models
  • experiments on sparse activations and efficiency tradeoffs
  • evaluation and fine-tuning workflows built on top of ReLU-Tune

This is primarily a research release, not a production-optimized instruction model.

Files

Typical files in this repository include:

  • model.safetensors
  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja
  • activation_config.json

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "bishmoy/Qwen3-1.7B-ReLU"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

Notes

  • activation_config.json is included for reproducibility and to document how activations were modified.
  • This release should follow the base Qwen model's license and usage terms. Users should review and comply with the original model terms before use or redistribution.
  • Because this is a merged checkpoint, no adapter loading is required for standard inference.

Project

This model was produced with ReLU-Tune, a toolkit for:

  • full or partial ReLU-fication
  • staged LoRA fine-tuning
  • benchmark and perplexity evaluation
  • activation sparsity measurement

Limitations

This is an experimental model release. ReLU-fication changes the base architecture and can alter behavior in ways that differ from the original dense model. Downstream performance, calibration, and robustness should be validated for any intended use.

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bishmoy/Qwen3-1.7B-ReLU

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(837)
this model