Sigmoid Head for TowerInstruct-7B-v0.2
This repo hosts a sigmoid quality-estimation (QE) head trained on top of
Unbabel/TowerInstruct-7B-v0.2.
It is the model from the paper Sigmoid Head for Quality Estimation under Language Ambiguity. Unlike the usual softmax LM head, this head uses a sigmoid activation, so multiple equally-valid tokens can simultaneously receive high scores. This produces a more reliable per-token quality / confidence score in settings with language ambiguity (e.g. machine translation).
- Base model:
Unbabel/TowerInstruct-7B-v0.2(frozen during training) - Head type: new unembedding head — a
torch.nn.Embedding(vocab_size, hidden_size)applied to the last hidden state - Activation: sigmoid (per-token, not normalized over vocab)
- Shape:
[32007, 4096] - Trained with: ambiguity-aware negative sampling
Files
model.safetensors— the trained head weights (single tensorweight).config.json—SigmoidHeadConfig(vocab/hidden sizes +auto_map).sigmoid_head.py—SigmoidHead(PreTrainedModel)definition; auto-loaded bytransformersviatrust_remote_code=True.
Usage
The head is loaded with transformers.AutoModel. Pass trust_remote_code=True
so transformers downloads sigmoid_head.py from this repo automatically.
1. Score an existing output (teacher forcing)
Given a (source-prompt, hypothesis) pair, compute a per-token confidence for the hypothesis. Useful for QE on outputs from any MT system.
import torch
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
BASE = "Unbabel/TowerInstruct-7B-v0.2"
HEAD = "tuanh23/SigmoidHead-TowerInstruct-7B-v0.2"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16).to(device).eval()
head = AutoModel.from_pretrained(HEAD, trust_remote_code=True).to(device).eval()
# Same chat-template format the head was trained on (see prepare_data.py).
src_lang, tgt_lang = "English", "German"
src = "The cat sat on the mat."
hypothesis = "Die Katze saß auf der Matte."
user_msg = {"role": "user", "content": f"Translate the following text from {src_lang} into {tgt_lang}.\n{src_lang}: {src}.\n{tgt_lang}: "}
asst_msg = {"role": "assistant", "content": " " + hypothesis}
# Full conversation -> input_ids for the model
input_ids = tokenizer.apply_chat_template(
[user_msg, asst_msg], tokenize=True, add_generation_prompt=False, return_tensors="pt"
).to(device)
# Same encoding but with the generation prompt added after the user turn -> tells us
# where the assistant content begins inside `input_ids`.
prompt_len = tokenizer.apply_chat_template(
[user_msg], tokenize=True, add_generation_prompt=True, return_tensors="pt"
).shape[1]
with torch.no_grad():
out = base_model(input_ids, output_hidden_states=True)
last_hidden = out.hidden_states[-1].float() # [1, T, hidden]
conf_full = head.score(last_hidden) # [1, T, vocab] in (0, 1)
# Per-token confidence for the actual next token at each position (shifted by 1)
target_ids = input_ids[:, 1:]
conf = conf_full[:, :-1, :].gather(-1, target_ids.unsqueeze(-1)).squeeze(-1) # [1, T-1]
# Confidence over just the assistant span (hypothesis + closing chat tokens):
hyp_conf = conf[0, prompt_len - 1:]
hyp_tokens = tokenizer.convert_ids_to_tokens(input_ids[0, prompt_len:].tolist())
print("Hypothesis:", hypothesis)
for tok, s in zip(hyp_tokens, hyp_conf.tolist()):
print(f" {tok!r:>20s} conf={s:.4f}")
print(f"Sentence-level (mean): {hyp_conf.mean().item():.4f}")
# Expected output:
# Hypothesis: Die Katze saß auf der Matte.
# '▁Die' conf=0.9999
# '▁Kat' conf=0.9995
# 'ze' conf=0.9992
# '▁sa' conf=0.9993
# 'ß' conf=1.0000
# '▁auf' conf=1.0000
# '▁der' conf=0.9983
# '▁Mat' conf=0.9992
# 'te' conf=0.9999
# '.' conf=0.9897
# '<|im_end|>' conf=1.0000
# '▁' conf=1.0000
# '<0x0A>' conf=1.0000
# Sentence-level (mean): 0.9988
2. Generate and score
The sigmoid head only needs the last-layer hidden states, which transformers.generate
already returns when you ask for them. So you can generate with the base LM and
score with the sigmoid head in one forward pass — no re-decoding.
import torch
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
BASE = "Unbabel/TowerInstruct-7B-v0.2"
HEAD = "tuanh23/SigmoidHead-TowerInstruct-7B-v0.2"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16).to(device).eval()
head = AutoModel.from_pretrained(HEAD, trust_remote_code=True).to(device).eval()
src_lang, tgt_lang = "English", "German"
src = "The cat sat on the mat."
messages = [{"role": "user", "content": f"Translate the following text from {src_lang} into {tgt_lang}.\n{src_lang}: {src}.\n{tgt_lang}: "}]
input_ids = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(device)
with torch.no_grad():
gen = base_model.generate(
input_ids=input_ids,
max_new_tokens=64,
do_sample=False, # greedy
output_hidden_states=True,
return_dict_in_generate=True,
)
# Stitch together per-step last-layer hidden states into [B, gen_len, hidden].
# Step 0 returns hidden states for the whole prompt — keep only the last position.
last_hidden = [step[-1] for step in gen.hidden_states]
last_hidden[0] = last_hidden[0][:, -1:, :]
last_hidden = torch.cat(last_hidden, dim=1).float() # [B, gen_len, hidden]
gen_ids = gen.sequences[:, input_ids.shape[1]:] # [B, gen_len]
conf_full = head.score(last_hidden) # [B, gen_len, vocab] in (0, 1)
conf = conf_full.gather(-1, gen_ids.unsqueeze(-1)).squeeze(-1) # [B, gen_len]
translation = tokenizer.decode(gen_ids[0], skip_special_tokens=True)
print("Translation:", translation)
for tok, s in zip(tokenizer.convert_ids_to_tokens(gen_ids[0].tolist()), conf[0].tolist()):
print(f" {tok!r:>20s} conf={s:.4f}")
print(f"Sentence-level (mean): {conf[0].mean().item():.4f}")
# Expected output:
# Translation: Die Katze saß auf der Matte.
# '▁Die' conf=0.9999
# '▁Kat' conf=0.9994
# 'ze' conf=0.9991
# '▁sa' conf=0.9993
# 'ß' conf=1.0000
# '▁auf' conf=1.0000
# '▁der' conf=0.9983
# '▁Mat' conf=0.9992
# 'te' conf=0.9999
# '.' conf=0.9900
# '<|im_end|>' conf=1.0000
# Sentence-level (mean): 0.9986
Why sigmoid?
A standard softmax head forces the probability mass to sum to 1 across the vocab, so when several outputs are equally valid, the mass is split and valid tokens might look low-confidence. The sigmoid head decouples tokens, so all valid options can score high simultaneously — a better proxy for quality.
Citation
@article{dinh2026sigmoid,
title = {Sigmoid Head for Quality Estimation under Language Ambiguity},
author = {Dinh, Tu Anh and Niehues, Jan},
journal = {arXiv preprint arXiv:2601.00680},
year = {2026}
}
Accepted to ACL 2026 (Main); proceedings not yet released.
Code
Training and evaluation code: https://github.com/tuanh23/sigmoid-head-qe.
- Downloads last month
- -
Model tree for tuanh23/SigmoidHead-TowerInstruct-7B-v0.2
Base model
Unbabel/TowerInstruct-7B-v0.2