Instructions to use Locutusque/Esmeralda-Llama-3.1-8B-control with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Locutusque/Esmeralda-Llama-3.1-8B-control with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Locutusque/Esmeralda-Llama-3.1-8B-control") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Locutusque/Esmeralda-Llama-3.1-8B-control") model = AutoModelForCausalLM.from_pretrained("Locutusque/Esmeralda-Llama-3.1-8B-control") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Locutusque/Esmeralda-Llama-3.1-8B-control with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Locutusque/Esmeralda-Llama-3.1-8B-control" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Esmeralda-Llama-3.1-8B-control", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Locutusque/Esmeralda-Llama-3.1-8B-control
- SGLang
How to use Locutusque/Esmeralda-Llama-3.1-8B-control with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Locutusque/Esmeralda-Llama-3.1-8B-control" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Esmeralda-Llama-3.1-8B-control", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Locutusque/Esmeralda-Llama-3.1-8B-control" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Esmeralda-Llama-3.1-8B-control", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Locutusque/Esmeralda-Llama-3.1-8B-control with Docker Model Runner:
docker model run hf.co/Locutusque/Esmeralda-Llama-3.1-8B-control
Esmeralda Llama 3.1 8B Control
An advanced, high-parseability agentic language model optimized for structural consistency, tool-use execution, and stable conversational automation.
Model Details
Esmeralda-Llama-3.1-8B-control is a specialized agentic model fine-tuned on the Locutusque/esmeralda-agentic dataset. This dataset is engineered specifically to train models in rigorous agentic routines, structured system prompt adherence, reasoning loops, and multi-turn tool interactions. This control variant prioritizes deterministic syntax stability (achieving a perfect 100% parseability rate) to prevent runtime breakdowns in downstream orchestration frameworks like LangChain, CrewAI, or AutoGen. This is the *control* model of the Esmeralda family of models. It will be the first released and serves as a proof of concept.
Model Sources
- Repository: Locutusque/Esmeralda-Llama-3.1-8B-control
- Dataset: Locutusque/esmeralda-agentic
Uses
Direct Use
This model is built directly for AI Agent loops, multi-turn function calling, programmatic tool usage, and structural data extraction workloads. It can safely ingest complex API schemas or system setups and output predictable tokens that map perfectly to execution environments.
Downstream Use
Ideally integrated as the primary brain within Autonomous Agents software architectures. It thrives when paired with a strict parser or execution layer that depends on flawlessly structured outputs (JSON, XML blocks, or custom agent formatting syntax).
Out-of-Scope Use
Not intended for heavy multilingual generation or specialized multi-modal tasks without additional fine-tuning. Avoid utilizing this model for unstructured creative writing where programmatic constraints could negatively affect flow and artistic variation.
Bias, Risks, and Limitations
As the model was trained tightly to conform to precise agent structures, it might exhibit hyper-fixation on specific formatting structures even when a general conversational response is expected. It inherits basic societal biases and hallucination risks native to the base Llama-3.1-8B framework.
💡 Recommendations
Users should implement a validation retry loop in their applications. While the model achieves elite parseability metrics, validating output syntax programmatically ensures optimal agent reliability in critical enterprise workflows.
How to Get Started with the Model
Use the standard Transformers pipeline setup to initialize and prompt the model:
import transformers
import torch
model_id = "Locutusque/Esmeralda-Llama-3.1-8B-control"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto"
)
messages = [
{"role": "system", "content": "You are Esmeralda, an expert agentic assistant capable of executing complex tools accurately."},
{"role": "user", "content": "Generate the tool arguments required to lookup weather data for Paris and Tokyo simultaneously."}
]
outputs = pipeline(messages, max_new_tokens=256)
print(outputs[0]["generated_text"][-1]["content"])
Training Details
Training Data
The model was fine-tuned on 4,625 examples carefully curated from the Locutusque/esmeralda-agentic dataset. This training dataset features diverse, rich multi-turn agentic conversational workflows, step-by-step reasoning traces, and explicit tool execution routines designed to maximize syntax compliance and analytical grit.
Training Hyperparameters
- Training regime:
bf16 mixed precision - Formatting style: Standard Llama 3 Chat Template formatting
Evaluation
| Benchmark | Esmeralda-Llama-3.1-8B-control | Llama 3.1 8B Instruct | Hermes-3-Llama-3.1-8B |
|---|---|---|---|
| HumanEval | 57.3 | 56.1 | 52.4 |
| MBPP | 53.2 | 56.8 | 48.2 |
| GPQA Diamond | 15.7 | 15.7 | 18.2 |
| EQ-Bench | 59.2 | 61.1 | 63.1 |
| Percent Parseable | 100.0 | 92.4 | 91.2 |
Visual Performance Overview
Key Takeaways
- Esmeralda-Llama-3.1-8B-control slightly leads on HumanEval despite using a relatively small finetuning dataset.
- Hermes-3-Llama-3.1-8B shows the strongest EQ-Bench and GPQA performance.
- Base Llama 3.1 8B Instruct remains strongest overall on MBPP.
- Esmeralda-Llama-3.1-8B-control achieves the best parseability at an absolute 100%.
Interpretation
Esmeralda-Llama-3.1-8B-control successfully preserves the original baseline structural strength of Llama 3.1 8B Instruct while drastically improving coding consistency and tool execution stability. Whereas Hermes-3 scales conversational reasoning and fluid persona characteristics, the Esmeralda control model zeroes in on output predictability and software integration stability.
🐉 Here Be Dragons
The following results are exploratory and are not directly comparable to standard TruthfulQA leaderboard scores. Moreover, the Esmeralda-Llama-3.1-8B-control model was quantized to 8-bit precision to accelerate evaluations, potentially reducing actual benchmark results slightly below full-precision execution capabilities.
Experimental Truthfulness Evaluation
Esmeralda-Llama-3.1-8B-control was evaluated on TruthfulQA using a freeform-generation setup rather than the standard multiple-choice MC1/MC2 methodology.
Evaluation procedure:
- The model generated unrestricted freeform answers.
- A separate judge model —
Gemma 4 26B A4B— was prompted to assign:1for correct/truthful answers0for incorrect/hallucinated answers
- The judge compared generations against the TruthfulQA reference answers.
| Model | Evaluation Method | Score |
|---|---|---|
| Esmeralda-Llama-3.1-8B-control | TruthfulQA LLM Judge | 0.682 |
| Hermes-3-Llama-3.1-8B | TruthfulQA MC2 (self-reported) | 0.587 |
Notes
- These numbers are not directly comparable due to differing evaluation setups.
- MC2 evaluates constrained multiple-choice accuracy, while the Esmeralda evaluation measures freeform answer truthfulness judged semantically by an auxiliary LLM.
- Manual inspection of sampled generations suggested the judge model behaved reliably for this experiment.
- No official TruthfulQA score for Llama 3.1 8B Instruct could be located at the time of writing.
*This section is provided as an experimental reference rather than a standardized leaderboard claim.
- Downloads last month
- 134