Pwicke
/

logprobs_for_CausalLMs

Model card Files Files and versions Community

logprobs_for_CausalLMs / README.md

Pwicke's picture

Update README.md

f4344ab over 1 year ago

|

raw history blame contribute delete

No virus

1.63 kB

	---
	license: cc-by-2.0
	tags:
	- logprobs
	- logits
	- CausalLM
	---


	The OpenAI API allows to retrieve log-probabilities per token (including both prompt and completion tokens) through the ``logprobs`` return argument. Currently, the ``CausalLM`` only provide ``logits`` return values, which should are the prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

	The following code provides an example of how to retrieve the log-probabilities per token of ``CausalLMs`` for the huggingface API:

	```python
	def logprobs_from_prompt(prompt, tokenizer, model):
	encoded = tokenizer(prompt, return_tensors="pt").to("cpu")
	input_ids = encoded["input_ids"]
	output = model(input_ids=input_ids)
	shift_labels = input_ids[..., 1:].contiguous()
	shift_logits = output.logits[..., :-1, :].contiguous()
	log_probs = []
	log_probs.append((tokenizer.decode(input_ids[0].tolist()[0]), None))
	for idx, (label_id, logit) in enumerate(zip(shift_labels[0].tolist(), shift_logits[0])):
	logprob = F.log_softmax(logit, dim=0).tolist()[label_id]
	log_probs.append((tokenizer.decode(label_id), float(logprob)))
	return log_probs
	```

	An example call would be:
	```python
	tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt")
	model = OPTForCausalLM.from_pretrained("facebook/opt")
	prompt = "The horse raced past the barn fell."
	logprobs = logprobs_from_prompt(prompt, tokenizer, model)

	```

	For its derivation and explanation see this [discussion](https://huggingface.co/bigscience/bloom/discussions/89#6321dcc9b97c618f9a5e3dac).