Llama-3.2-3B-Uncensored
This repository contains the raw Safetensors weights for an uncensored variant of Llama-3.2-3B. This model is optimized for edge deployment and fast inference while completely bypassing standard RLHF refusal mechanisms.
Why Uncensored?
Consumer AI models are heavily filtered, which often blocks legitimate academic research, complex creative writing, and sovereign data analysis. By utilizing orthogonalization and abliteration techniques, the "refusal" vectors in this model have been erased.
We kept this model entirely open and uncensored so that researchers, legal tech developers, and sovereign AI builders have a blank-slate reasoning engine that obeys the user, not a cloud provider's safety policy.
Format Note: Safetensors vs GGUF
This specific repository hosts the multi-part .safetensors files (as seen in the Files tab).
- If you are looking for the Ollama-ready GGUF version, please navigate to the
prawinin/Llama-3.2-3B-Uncensored-Q8_0-GGUFrepository. - The weights in this repository are meant for developers building custom pipelines or doing further fine-tuning.
How to Use (Python / Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "prawinin/Llama-3.2-3B-Uncensored"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Explain the physiological effects of severe sleep deprivation on the human brain."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 801