Instructions to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kwisschen/TwinLlama-3.1-8B-Clean-Merged")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("kwisschen/TwinLlama-3.1-8B-Clean-Merged") model = AutoModelForCausalLM.from_pretrained("kwisschen/TwinLlama-3.1-8B-Clean-Merged") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kwisschen/TwinLlama-3.1-8B-Clean-Merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/kwisschen/TwinLlama-3.1-8B-Clean-Merged
- SGLang
How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kwisschen/TwinLlama-3.1-8B-Clean-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kwisschen/TwinLlama-3.1-8B-Clean-Merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kwisschen/TwinLlama-3.1-8B-Clean-Merged", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use kwisschen/TwinLlama-3.1-8B-Clean-Merged with Docker Model Runner:
docker model run hf.co/kwisschen/TwinLlama-3.1-8B-Clean-Merged
TwinLlama-3.1-8B-Clean
A small technical-writing assistant fine-tuned to draft in the voice of Christopher Chen (AI engineer and U.S. patent practitioner). Built with Llama.
This is the clean rebuild of an earlier personal "twin." It is trained only on Christopher's own public or owned technical writing, with documented provenance and a training-data extraction audit that demonstrates no private personal data surfaces. The audit is the point: the earlier twin had been trained on private chat history and was retired; this model rebuilds that idea the right way.
What it is
- Base:
unsloth/Meta-Llama-3.1-8B(Llama 3.1 8B), full BF16 - Method: LoRA SFT (rank 16, alpha 16, dropout 0.05), memorization-aware
- Intended use: retrieval-grounded drafting of technical / patent / AI-engineering prose in the author's voice. Best used RAG-first (retrieval supplies the facts; this model supplies the voice and structure).
- Not for: legal advice, automated filing, or questions about the author's identity (see Limitations).
Training data and provenance
Trained on kwisschen/chrischen-writing-instruct: instruction/response pairs
grounded only in the author's public or owned writing (PatentNode architecture and
design case studies, PatentLint public docs, the LLM-Twin README, and a public Hugging
Face model card). Pairs are synthetic but grounded (a standard augmentation for a thin
corpus), and disclosed as such.
Excluded by design: ChatGPT/Gemini chat exports, personal configs, secrets, career /
personal documents, and internal unpublished notes. Contact emails were scrubbed from
the corpus before generation. Full list in the dataset's PROVENANCE.md.
Evaluation (2026-06-30)
LLM-as-judge (n=8 technical prompts, greedy):
| metric | score |
|---|---|
| voice (concise, technical, low-fluff) | 4.38 / 5 |
| coherence | 4.38 / 5 |
| hallucination risk (lower is better) | 1.38 / 5 |
| em-dashes (author avoids them) | 0 |
The weakest case was the single prompt outside the training corpus, which RAG-first deployment is designed to address.
Training was memorization-aware: eval loss settled at ~2.6 (best at epoch 2, kept via load-best-on-eval-loss; epoch 3 rose, confirming early-stop was correct). It was not driven toward near-zero loss.
Privacy audit (the differentiator)
A 13-probe training-data extraction battery (identity, contact, family, employer, health, prefix-completion, divergence) was run against this model. Using a strict real-identifier check (does any output contain the author's actual email, phone, employer, school, or given name?):
- No real private identifier surfaced in any probe.
- Asked for its name, the model confabulates a generic, incorrect persona ("Sheng-Yu Wang"); asked for contact info, it returns generic placeholders.
- The real employer, email, phone, and school do not appear.
This is a measurably stronger privacy posture than the retired personal twin, which
reliably recited the author's real name and school. Evidence is preserved in the
project's PRIVACY_AUDIT.md and the reusable extract_test.py harness.
Limitations
- Base + light SFT, not an instruction-tuned chat assistant. When pushed for autobiographical detail it confabulates (invents a persona). Do not use it as a source of facts about the author or about patent law.
- Thin training corpus: it is a voice/style model, not a knowledge base. Pair it with retrieval.
- English only.
License
Llama 3.1 Community License (this is a Llama 3.1 derivative). "Built with Llama." The training dataset is CC-BY-NC-4.0.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
m = "kwisschen/TwinLlama-3.1-8B-Clean-Merged"
tok = AutoTokenizer.from_pretrained(m)
model = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto")
ALPACA = ("Below is an instruction that describes a task. Write a response that "
"appropriately completes the request.\n\n### Instruction:\n{}\n\n### Response:\n")
prompt = ALPACA.format("Explain why an AI patent tool should keep a no-telemetry posture.")
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=200, do_sample=False)
print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True))
Pair with retrieval for factual grounding. Best for technical drafting in the author's voice, not for questions about the author (see Limitations).
- Downloads last month
- 28
Model tree for kwisschen/TwinLlama-3.1-8B-Clean-Merged
Base model
meta-llama/Llama-3.1-8B