Instructions to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kosiasuzu/agenticml-agent-llama-3.1-8b-init") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("kosiasuzu/agenticml-agent-llama-3.1-8b-init") model = AutoModelForCausalLM.from_pretrained("kosiasuzu/agenticml-agent-llama-3.1-8b-init") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kosiasuzu/agenticml-agent-llama-3.1-8b-init" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kosiasuzu/agenticml-agent-llama-3.1-8b-init
- SGLang
How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kosiasuzu/agenticml-agent-llama-3.1-8b-init" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kosiasuzu/agenticml-agent-llama-3.1-8b-init" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kosiasuzu/agenticml-agent-llama-3.1-8b-init", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use kosiasuzu/agenticml-agent-llama-3.1-8b-init with Docker Model Runner:
docker model run hf.co/kosiasuzu/agenticml-agent-llama-3.1-8b-init
Telos Llama-3.1-8B (init)
A Llama-3.1-8B base model with eleven of its reserved special tokens seeded with semantically related-content-token embeddings, in preparation for fine-tuning on the Telos agent trajectory format.
This is not a fine-tuned agent model. It is the base model with embedding initialization applied. Behavior on any task is identical or near-identical to vanilla Llama-3.1-8B-base; the only difference is that the eleven Telos reserved tokens now have non-zero embeddings in both the input and output matrices.
Model details
- Base model:
meta-llama/Llama-3.1-8B - Modification: in-place initialization of eleven reserved-token rows
in
embed_tokensandlm_head - Initialization method: for each Telos marker, the mean of the input/output embeddings of 2-3 semantically related content tokens
- Tokenizer: unchanged from the base model
- Vocabulary size: unchanged (128 256)
Token mapping
The Telos format aliases these eleven reserved tokens to frame markers at the string level. The tokenizer in this repo is unchanged from the base; aliasing is done by the Telos SDK at encode/decode time.
| Telos marker | Reserved token | Token ID | Seed words |
|---|---|---|---|
<|goal|> |
<|reserved_special_token_0|> |
128002 | goal, objective, purpose |
<|mission|> |
<|reserved_special_token_1|> |
128003 | mission, task, instruction |
<|obs|> |
<|reserved_special_token_2|> |
128005 | observation, context, environment |
<|belief|> |
<|reserved_special_token_3|> |
128011 | belief, state, knowledge |
<|plan|> |
<|reserved_special_token_4|> |
128012 | plan, strategy, approach |
<|think|> |
<|reserved_special_token_5|> |
128013 | think, reasoning, thought |
<|action|> |
<|reserved_special_token_6|> |
128014 | action, call, tool |
<|end|> |
<|reserved_special_token_7|> |
128015 | end, stop, done |
<|result|> |
<|reserved_special_token_8|> |
128016 | result, output, response |
<|feedback|> |
<|reserved_special_token_9|> |
128017 | feedback, update, progress |
<|reward|> |
<|reserved_special_token_10|> |
128018 | reward, score |
Why initialization was needed
In the base Llama-3.1-8B model, all 250 reserved special tokens have
all-zero embeddings in both embed_tokens and lm_head. They were
registered as vocabulary entries but never received any pretraining
gradient.
For Telos, this is degenerate: the model cannot read the markers as
input (zero embedding contributes nothing) and cannot emit them as
output (zero lm_head row โ near-zero logit โ near-zero probability
after softmax). Empirically, prompting the base model with a
Telos-formatted trajectory causes the model to ignore the markers
entirely and loop on prose content.
Mean-of-related-tokens initialization seeds each marker with a sensible starting representation. The model still does not understand the Telos format - that requires fine-tuning - but the markers now contribute meaningful signal to the forward pass and have non-zero output logits.
Intended use
This checkpoint is intended as the starting point for fine-tuning on Telos-formatted trajectories. Use it the same way you would use the plain Llama-3.1-8B base.
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("kosiasuzu/telos-llama-3.1-8b-init")
model = AutoModelForCausalLM.from_pretrained(
"kosiasuzu/telos-llama-3.1-8b-init",
torch_dtype="bfloat16",
device_map="auto",
)
Out-of-scope use
- Not an agent yet. This checkpoint has not been trained on any agent trajectories. Do not expect it to follow the Telos format correctly.
- Not an instruction-tuned model. It inherits all the base-model limitations of Llama-3.1-8B (looping on greedy decoding, no instruction following).
- All limitations and biases of Llama-3.1-8B base apply unchanged.
License
Inherits the Llama 3.1 Community License from the base model. Use of this model is subject to that license's terms.
Citation
If you build on this, please cite the Telos project and the underlying Llama-3.1 model.
- Downloads last month
- 268