Instructions to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kosiasuzu/agenticml-agent-llama-3.1-8b-init")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kosiasuzu/agenticml-agent-llama-3.1-8b-init")
model = AutoModelForCausalLM.from_pretrained("kosiasuzu/agenticml-agent-llama-3.1-8b-init")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kosiasuzu/agenticml-agent-llama-3.1-8b-init"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kosiasuzu/agenticml-agent-llama-3.1-8b-init

SGLang

How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kosiasuzu/agenticml-agent-llama-3.1-8b-init" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kosiasuzu/agenticml-agent-llama-3.1-8b-init" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with Docker Model Runner:
```
docker model run hf.co/kosiasuzu/agenticml-agent-llama-3.1-8b-init
```

Telos Llama-3.1-8B (init)

A Llama-3.1-8B base model with eleven of its reserved special tokens seeded with semantically related-content-token embeddings, in preparation for fine-tuning on the Telos agent trajectory format.

This is not a fine-tuned agent model. It is the base model with embedding initialization applied. Behavior on any task is identical or near-identical to vanilla Llama-3.1-8B-base; the only difference is that the eleven Telos reserved tokens now have non-zero embeddings in both the input and output matrices.

Model details

Base model: meta-llama/Llama-3.1-8B
Modification: in-place initialization of eleven reserved-token rows in embed_tokens and lm_head
Initialization method: for each Telos marker, the mean of the input/output embeddings of 2-3 semantically related content tokens
Tokenizer: unchanged from the base model
Vocabulary size: unchanged (128 256)

Token mapping

The Telos format aliases these eleven reserved tokens to frame markers at the string level. The tokenizer in this repo is unchanged from the base; aliasing is done by the Telos SDK at encode/decode time.

Telos marker	Reserved token	Token ID	Seed words
`<\|goal\|>`	`<\|reserved_special_token_0\|>`	128002	goal, objective, purpose
`<\|mission\|>`	`<\|reserved_special_token_1\|>`	128003	mission, task, instruction
`<\|obs\|>`	`<\|reserved_special_token_2\|>`	128005	observation, context, environment
`<\|belief\|>`	`<\|reserved_special_token_3\|>`	128011	belief, state, knowledge
`<\|plan\|>`	`<\|reserved_special_token_4\|>`	128012	plan, strategy, approach
`<\|think\|>`	`<\|reserved_special_token_5\|>`	128013	think, reasoning, thought
`<\|action\|>`	`<\|reserved_special_token_6\|>`	128014	action, call, tool
`<\|end\|>`	`<\|reserved_special_token_7\|>`	128015	end, stop, done
`<\|result\|>`	`<\|reserved_special_token_8\|>`	128016	result, output, response
`<\|feedback\|>`	`<\|reserved_special_token_9\|>`	128017	feedback, update, progress
`<\|reward\|>`	`<\|reserved_special_token_10\|>`	128018	reward, score

Why initialization was needed

In the base Llama-3.1-8B model, all 250 reserved special tokens have all-zero embeddings in both embed_tokens and lm_head. They were registered as vocabulary entries but never received any pretraining gradient.

For Telos, this is degenerate: the model cannot read the markers as input (zero embedding contributes nothing) and cannot emit them as output (zero lm_head row → near-zero logit → near-zero probability after softmax). Empirically, prompting the base model with a Telos-formatted trajectory causes the model to ignore the markers entirely and loop on prose content.

Mean-of-related-tokens initialization seeds each marker with a sensible starting representation. The model still does not understand the Telos format - that requires fine-tuning - but the markers now contribute meaningful signal to the forward pass and have non-zero output logits.

Intended use

This checkpoint is intended as the starting point for fine-tuning on Telos-formatted trajectories. Use it the same way you would use the plain Llama-3.1-8B base.

from transformers import AutoModelForCausalLM, AutoTokenizer
 
tokenizer = AutoTokenizer.from_pretrained("kosiasuzu/telos-llama-3.1-8b-init")
model = AutoModelForCausalLM.from_pretrained(
    "kosiasuzu/telos-llama-3.1-8b-init",
    torch_dtype="bfloat16",
    device_map="auto",
)

Out-of-scope use

Not an agent yet. This checkpoint has not been trained on any agent trajectories. Do not expect it to follow the Telos format correctly.
Not an instruction-tuned model. It inherits all the base-model limitations of Llama-3.1-8B (looping on greedy decoding, no instruction following).
All limitations and biases of Llama-3.1-8B base apply unchanged.

License

Inherits the Llama 3.1 Community License from the base model. Use of this model is subject to that license's terms.

Citation

If you build on this, please cite the Telos project and the underlying Llama-3.1 model.

Downloads last month: 268

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for kosiasuzu/agenticml-agent-llama-3.1-8b-init

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1408)

this model

Adapters

2 models