Instructions to use xlr8harder/talkie-web-13b-base-tf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xlr8harder/talkie-web-13b-base-tf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="xlr8harder/talkie-web-13b-base-tf", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("xlr8harder/talkie-web-13b-base-tf", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use xlr8harder/talkie-web-13b-base-tf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "xlr8harder/talkie-web-13b-base-tf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-web-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/xlr8harder/talkie-web-13b-base-tf

SGLang

How to use xlr8harder/talkie-web-13b-base-tf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "xlr8harder/talkie-web-13b-base-tf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-web-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "xlr8harder/talkie-web-13b-base-tf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "xlr8harder/talkie-web-13b-base-tf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use xlr8harder/talkie-web-13b-base-tf with Docker Model Runner:
```
docker model run hf.co/xlr8harder/talkie-web-13b-base-tf
```

talkie-web-13b-base-tf (BF16 Transformers + safetensors conversion)

This repository is a Transformers-compatible conversion of talkie-lm/talkie-web-13b-base, the original Talkie base completion model.

The upstream model is a 13B language model trained on 260B tokens of FineWeb. The original model card describes it as architecturally identical to talkie-lm/talkie-1930-13b-base and intended for controlled comparisons between vintage and modern language models.

The original base checkpoint is FP32. This repository stores a BF16 conversion of those weights and packages them for Transformers with custom trust_remote_code modules and BF16 sharded safetensors.

This is not an official Talkie release; refer to the upstream model card for the author-provided provenance and usage notes.

Source Model

Original model: talkie-lm/talkie-web-13b-base
Talkie report: talkie-lm.com
Reference code: github.com/talkie-lm/talkie

Conversion Details

Weight dtype: BF16
Weight format: sharded safetensors
Context length: 2048 tokens
Architecture: custom Talkie code loaded with trust_remote_code=True
Tokenizer: Talkie tiktoken-compatible tokenizer exposed through AutoTokenizer

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

path = "xlr8harder/talkie-web-13b-base-tf"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    path,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map={"": "cuda"},
    use_safetensors=True,
)

For base completions:

inputs = tokenizer("The latest discoveries in physics suggest that", return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))

vLLM

The included remote-code model implements the Transformers attention-interface hooks expected by vLLM's Transformers modeling backend. For compatibility with that backend, the original single-scalar lm_head_gain is folded into lm_head.weight during conversion; the other Talkie gain parameters remain explicit model parameters. Using vLLM's logit_scale-style approach was not used because it applies scaling after the output matmul, while Talkie applies the gain to the head weight before the matmul. In BF16 this can introduce small rounding differences and, in smoke tests, changed one near-tied top-token ordering.

vllm serve xlr8harder/talkie-web-13b-base-tf \
  --task generate \
  --model-impl transformers \
  --trust-remote-code \
  --dtype bfloat16 \
  --max-model-len 2048

Validation

The Transformers safetensors model was compared against the original Talkie web FP32 checkpoint on a forward-pass smoke test. The top-10 next-token ordering matched exactly; observed max absolute logit difference was 0.03125.

Downloads last month: 27

Safetensors

Model size

13B params

Tensor type

BF16

Model tree for xlr8harder/talkie-web-13b-base-tf

Base model

talkie-lm/talkie-web-13b-base

Finetuned

(2)

this model

Collection including xlr8harder/talkie-web-13b-base-tf

Talkie 1930 conversions

Collection

Conversion of the Talkie 1930 models to huggingface transformers • 3 items • Updated 4 days ago • 2