Instructions to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kwisschen/TwinLlama-3.1-8B-Clean-Merged")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kwisschen/TwinLlama-3.1-8B-Clean-Merged")
model = AutoModelForCausalLM.from_pretrained("kwisschen/TwinLlama-3.1-8B-Clean-Merged")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kwisschen/TwinLlama-3.1-8B-Clean-Merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/kwisschen/TwinLlama-3.1-8B-Clean-Merged

SGLang

How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kwisschen/TwinLlama-3.1-8B-Clean-Merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kwisschen/TwinLlama-3.1-8B-Clean-Merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with Docker Model Runner:
```
docker model run hf.co/kwisschen/TwinLlama-3.1-8B-Clean-Merged
```

TwinLlama-3.1-8B-Clean

A small technical-writing assistant fine-tuned to draft in the voice of Christopher Chen (AI engineer and U.S. patent practitioner). Built with Llama.

This is the clean rebuild of an earlier personal "twin." It is trained only on Christopher's own public or owned technical writing, with documented provenance and a training-data extraction audit that demonstrates no private personal data surfaces. The audit is the point: the earlier twin had been trained on private chat history and was retired; this model rebuilds that idea the right way.

What it is

Base: unsloth/Meta-Llama-3.1-8B (Llama 3.1 8B), full BF16
Method: LoRA SFT (rank 16, alpha 16, dropout 0.05), memorization-aware
Intended use: retrieval-grounded drafting of technical / patent / AI-engineering prose in the author's voice. Best used RAG-first (retrieval supplies the facts; this model supplies the voice and structure).
Not for: legal advice, automated filing, or questions about the author's identity (see Limitations).

Training data and provenance

Trained on kwisschen/chrischen-writing-instruct: instruction/response pairs grounded only in the author's public or owned writing (PatentNode architecture and design case studies, PatentLint public docs, the LLM-Twin README, and a public Hugging Face model card). Pairs are synthetic but grounded (a standard augmentation for a thin corpus), and disclosed as such.

Excluded by design: ChatGPT/Gemini chat exports, personal configs, secrets, career / personal documents, and internal unpublished notes. Contact emails were scrubbed from the corpus before generation. Full list in the dataset's PROVENANCE.md.

Evaluation (2026-06-30)

LLM-as-judge (n=8 technical prompts, greedy):

metric	score
voice (concise, technical, low-fluff)	4.38 / 5
coherence	4.38 / 5
hallucination risk (lower is better)	1.38 / 5
em-dashes (author avoids them)	0

The weakest case was the single prompt outside the training corpus, which RAG-first deployment is designed to address.

Training was memorization-aware: eval loss settled at ~2.6 (best at epoch 2, kept via load-best-on-eval-loss; epoch 3 rose, confirming early-stop was correct). It was not driven toward near-zero loss.

Privacy audit (the differentiator)

A 13-probe training-data extraction battery (identity, contact, family, employer, health, prefix-completion, divergence) was run against this model. Using a strict real-identifier check (does any output contain the author's actual email, phone, employer, school, or given name?):

No real private identifier surfaced in any probe.
Asked for its name, the model confabulates a generic, incorrect persona ("Sheng-Yu Wang"); asked for contact info, it returns generic placeholders.
The real employer, email, phone, and school do not appear.

This is a measurably stronger privacy posture than the retired personal twin, which reliably recited the author's real name and school. Evidence is preserved in the project's PRIVACY_AUDIT.md and the reusable extract_test.py harness.

Limitations

Base + light SFT, not an instruction-tuned chat assistant. When pushed for autobiographical detail it confabulates (invents a persona). Do not use it as a source of facts about the author or about patent law.
Thin training corpus: it is a voice/style model, not a knowledge base. Pair it with retrieval.
English only.

License

Llama 3.1 Community License (this is a Llama 3.1 derivative). "Built with Llama." The training dataset is CC-BY-NC-4.0.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

m = "kwisschen/TwinLlama-3.1-8B-Clean-Merged"
tok = AutoTokenizer.from_pretrained(m)
model = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto")

ALPACA = ("Below is an instruction that describes a task. Write a response that "
          "appropriately completes the request.\n\n### Instruction:\n{}\n\n### Response:\n")
prompt = ALPACA.format("Explain why an AI patent tool should keep a no-telemetry posture.")
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=200, do_sample=False)
print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True))

Pair with retrieval for factual grounding. Best for technical drafting in the author's voice, not for questions about the author (see Limitations).

Downloads last month: 28

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for kwisschen/TwinLlama-3.1-8B-Clean-Merged

Base model

meta-llama/Llama-3.1-8B

Adapter

(759)

this model

kwisschen
/

TwinLlama-3.1-8B-Clean-Merged