Instructions to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF", dtype="auto") - llama-cpp-python
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF", filename="Qwen3-Space.Agent.Claude-Uncensored-4B-Q4_K_S.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Use Docker
docker model run hf.co/WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Ollama:
ollama run hf.co/WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
- Unsloth Studio
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF to start chatting
- Pi
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Docker Model Runner:
docker model run hf.co/WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
- Lemonade
How to use WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WithinUsAI/Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3-Space.Agent.Claude.Uncensored-4B.GGUF-Q4_K_M
List all available models
lemonade list
Qwen3-Space.Agent.Claude-Uncensored-4B
📌 Model Overview
Model Name: WithinUsAI/Qwen3-Space.Agent.Claude-Uncensored-4B Organization: Within Us AI Model Type: Agentic Reasoning LLM (Uncensored Variant) Parameter Size: 4B Architecture: Qwen 3 (Dense Transformer) Context Length: ~32K tokens Primary Focus: Agent workflows + uncensored reasoning + long-context tasks
This model is a multi-source merged Qwen3-based agent, designed to combine:
- 🧠 Reasoning (“thinking” models)
- 🤖 Agent/tool-use behavior
- 🔓 Reduced refusal / uncensored outputs
It aims to deliver a compact, flexible, and less-restricted AI system for experimentation, research, and local deployment. 
⸻
🧬 Architecture & Lineage
Base Composition
This model is a merge of multiple Qwen3-derived systems, including:
- Qwen3-4B Thinking (reasoning-focused)
- Qwen3 Agent Claude/Gemini-style model
- Uncensored Qwen3 variants
These were combined into a single unified 4B model to blend capabilities. 
What That Creates
A hybrid model with:
- Reasoning depth (thinking models)
- Structured outputs (agent models)
- Reduced refusal behavior (uncensored variants)
Think of it like a three-engine spacecraft 🚀 Each engine specialized… now flying as one system.
⸻
🧠 Core Design Philosophy
Fuse the best behaviors… remove the limits… keep it small enough to run anywhere.
Key Goals:
- Merge reasoning + agent + uncensored traits
- Enable long-context problem solving
- Preserve performance in a 4B footprint
- Support real-world agent pipelines
⸻
⚙️ Key Capabilities
🧠 Reasoning
- Step-by-step thinking
- Multi-hop problem solving
- Long-context coherence (~32K tokens)
🤖 Agentic Behavior
- Task decomposition
- Tool-use compatibility
- Structured outputs (JSON, actions)
💻 Coding
- Code generation & debugging
- Algorithm reasoning
- SWE-style workflows
🔓 Uncensored Behavior
- Reduced refusal rates
- More permissive responses
- Suitable for:
- Alignment research
- Safety testing
- Edge-case exploration
⸻
📦 Deployment
Supported Environments
- llama.cpp
- LM Studio
- Ollama (GGUF / compatible builds depending on conversion)
Runtime Characteristics
- ~4B parameters → runs on consumer GPUs / strong CPUs
- ~32K context → supports long conversations and documents 
⸻
🚀 Intended Use
✅ Ideal Use Cases
- Agent frameworks (tool-calling systems)
- Long-context reasoning tasks
- AI experimentation (uncensored behavior)
- Local assistants with fewer restrictions
- Alignment and safety research
⚠️ Important Considerations
- Outputs are less restricted than aligned models
- May generate sensitive or unsafe content
- Requires external moderation or guardrails for production use
⸻
🧪 Training & Merge Methodology
This model follows a merge-based synthesis pipeline:
- Select complementary base models:
- Reasoning-focused
- Agent-focused
- Uncensored variants
- Merge weights into unified architecture
- Align behavior using preference tuning (DPO-style datasets)
- Optimize for:
- Reduced refusals
- Stable outputs
- Agent usability 
⸻
📊 Expected Performance Profile
Capability Strength Reasoning High Agent behavior High Coding High Context handling High Safety filtering Low (intentionally reduced)
⸻
📚 Datasets & Training Sources
Following Within Us AI methodology:
- Proprietary datasets created by Within Us AI
- Third-party datasets used without ownership claims
- Includes:
- Reasoning traces
- Agent workflows
- Preference optimization (DPO-style tuning)
⸻
📜 License
License Type: Inherits from Qwen / base model ecosystem
Attribution Notes:
- Base models: Qwen (Alibaba ecosystem)
- Merge & methodology: Within Us AI
- Additional model influences (Claude-style / Gemini-style behaviors via distillation/merging)
- Third-party datasets used without ownership claims
- Credit belongs to original creators
⸻
🙏 Acknowledgements
- Alibaba Qwen team
- Open-source agent model contributors
- GGUF / llama.cpp ecosystem
- AI alignment & safety research community
⸻
🔗 Links
- Model: https://huggingface.co/WithinUsAI/Qwen3-Space.Agent.Claude-Uncensored-4B
- Organization: https://huggingface.co/WithinUsAI
⸻
🧩 Closing Note
This model feels like a hybrid intelligence node 🌌
Part thinker. Part agent. Part rule-breaker.
All compressed into 4B parameters that punch way above their weight.
- Downloads last month
- 668
4-bit
5-bit