Sebastian Gabarain

Locutusque

350 41 777

SebastianG74019

AI & ML interests

Pushing performance in small language models

Recent Activity

liked a dataset 6 days ago

MedOtter/2018-Data-Science-Bowl

liked a dataset 7 days ago

g19-f/AsPC-1

liked a model 25 days ago

Locutusque/Esmeralda-Qwen3.6-35B-A3B

View all activity

Organizations

Posts 13

Post

382

🚀 Introducing Esmeralda-Llama-3.1-8B-control
The first release in the Esmeralda model family by Locutusque.

This model is intentionally small and experimental — a control/baseline proof-of-concept designed to answer one question:

«“How strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?”»

Training Details

- Base: Llama 3.1 8B
- Training precision: bf16 mixed precision
- Chat template: modified ChatML
- Dataset size: ~37k examples
- Examples actually used for this run: ~5k

The dataset includes:

- multi-turn agentic traces
- reasoning traces
- structured assistant behavior
- generalist instruction data

Benchmark Results

Compared against:

- Llama 3.1 8B Instruct
- Hermes-3-Llama-3.1-8B

HumanEval

57.3 — Esmeralda
56.1 — Llama 3.1 Instruct
52.4 — Hermes-3

MBPP

53.2 — Esmeralda
56.8 — Llama 3.1 Instruct
48.2 — Hermes-3

GPQA Diamond

15.7 — Esmeralda
15.7 — Llama 3.1 Instruct
18.2 — Hermes-3

EQ-Bench

59.2 — Esmeralda
61.1 — Llama 3.1 Instruct
63.1 — Hermes-3

EQ-Bench Parseable (Syntax Stability)

🔥 100.0% — Esmeralda
92.4% — Llama 3.1 Instruct
91.2% — Hermes-3

Here Be Dragons 🐉

I also experimented with a new TruthfulQA free-generation evaluation setup.

- Responses were judged by Gemma 4 26B A4B
- The judge compared generations directly against ground-truth answers
- Models were evaluated in 8-bit quantized form to speed up inference

TruthfulQA (LLM Judge)

0.682 — Esmeralda-Llama-3.1-8B-control
0.587 — Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs)

For a lightweight control run trained on only a fraction of the dataset, I’m pretty encouraged by the results.

The model is released under the standard Llama 3.1 license, and I’d genuinely love feedback from people testing it in real workflows.

Model: Locutusque/Esmeralda-Llama-3.1-8B-control

Dataset: Locutusque/esmeralda-agentic

Post

3056

🚀 AutoXLA - Accelerating Large Models on TPU
AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster.
With quantization and Splash Attention kernels, AutoXLA achieves up to 4× speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads.
Whether you’re experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless.
⚠️ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues.
🔗 GitHub Repository: https://github.com/Locutusque/AutoXLA