Instructions to use Rangle2/gemma-4-12B-uncensored-opus4.7-cot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rangle2/gemma-4-12B-uncensored-opus4.7-cot with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Rangle2/gemma-4-12B-uncensored-opus4.7-cot") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Rangle2/gemma-4-12B-uncensored-opus4.7-cot") model = AutoModelForMultimodalLM.from_pretrained("Rangle2/gemma-4-12B-uncensored-opus4.7-cot") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Rangle2/gemma-4-12B-uncensored-opus4.7-cot with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Rangle2/gemma-4-12B-uncensored-opus4.7-cot" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rangle2/gemma-4-12B-uncensored-opus4.7-cot", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Rangle2/gemma-4-12B-uncensored-opus4.7-cot
- SGLang
How to use Rangle2/gemma-4-12B-uncensored-opus4.7-cot with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Rangle2/gemma-4-12B-uncensored-opus4.7-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rangle2/gemma-4-12B-uncensored-opus4.7-cot", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Rangle2/gemma-4-12B-uncensored-opus4.7-cot" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rangle2/gemma-4-12B-uncensored-opus4.7-cot", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Rangle2/gemma-4-12B-uncensored-opus4.7-cot with Docker Model Runner:
docker model run hf.co/Rangle2/gemma-4-12B-uncensored-opus4.7-cot
gemma-4-12B-uncensored-opus4.7-cot
A QLoRA fine-tune of an uncensored gemma-4-12B-it (abliteration-derived),
distilled from Claude Opus 4.7 chain-of-thought traces. The idea was to see
how much of the capability loss caused by abliteration could be recovered by
training the model to reason in a more structured, deliberative style, without
restoring refusal.
The merged model is provided here in fp16 safetensors.
Benchmarks
Evaluated with lm-evaluation-harness in f16. MMLU is run with the chat
template (multi-turn few-shot), since the bare loglikelihood mode noticeably
underrates models with a thinking template on this architecture.
| MMLU 5-shot (chat) | GSM8K 8-shot (CoT) | |
|---|---|---|
google/gemma-4-12B-it |
0.777 | 0.949 |
| this model | 0.739 | 0.920 |
15,361 standard questions in total (14,042 MMLU + 1,319 GSM8K). The fine-tune does not surpass the clean base, but closes most of the gap that abliteration alone typically opens.
Usage
Trained to think out loud. A useful system prompt:
You are a reasoning assistant. Think step by step, then give your final
answer on a clearly marked last line beginning with "Final answer:".
Allow at least 768 generation tokens — shorter budgets cut off chains of thought mid-derivation and make the model look worse than it is.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo = "Rangle2/gemma-4-12B-uncensored-opus4.7-cot"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16,
device_map="auto")
Limitations
Trained on STEM-style verbal reasoning traces, so gains are concentrated there. Code generation regresses a little compared to the clean base — the model's outputs got more verbose, which is the wrong shape for code. Tool use, long-context retrieval and non-English usage were not in the training set and are unevaluated. The underlying abliterated direction is inherited: the model is overconfident and rarely defers.
Safety-style phrases ("the safe answer is to explain…") still show up inside chains of thought, but the model proceeds to answer anyway. This is the expected deliberate-then-comply pattern of abliterated models, not real alignment — don't read those phrases as a guardrail.
Disclaimer
This model has had its refusal behavior aggressively removed and will attempt to answer prompts that a standard instruction-tuned model would correctly decline. It is released for research, red-teaming and interpretability work.
It is provided as is, with no warranty of any kind, and the author disclaims all liability for any direct or indirect damage arising from its use, misuse or redistribution. You are solely responsible for the prompts you send to it, the outputs it produces for you, and any downstream use of those outputs. You must comply with all laws applicable to you and to any users you expose this model to, and with the Gemma Terms of Use of the upstream Google model.
Do not deploy this model to end users without your own safety layer (input filtering, output classification, human review). Outputs may be wrong, biased, offensive or unsafe; do not rely on them for medical, legal, financial or safety-critical decisions.
By downloading or using this model, you accept all of the above.
Training
- Base: uncensored
gemma-4-12B-it(abliteration-derived). - Teacher data: Claude Opus 4.7 chain-of-thought traces
(
eddieran/opus-4.7-reasoning-cot). - QLoRA, r=16, α=32 on
q_proj/v_proj, bf16 compute, 4-bit NF4 base, 2 epochs, max_len=3072, paged-AdamW-8bit, single A100-80GB. - Adapter (~40 MB) merged into the base in fp16; this repo carries the merged weights.
- Downloads last month
- 1