Pwicke commited on
Commit
eca6ced
1 Parent(s): 96dc616

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The *OpenAI* API allows to retrieve log-probabilities per token (including both prompt and completion tokens) through the ``logprobs`` return argument. Currently, the ``CausalLM`` only provide ``logits`` return values, which should are the prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
2
+
3
+ The following code provides an example of how to retrieve the log-probabilities per token of ``CausalLMs`` for the huggingface API:
4
+
5
+ ```python
6
+ def logprobs_from_prompt(prompt, tokenizer, model):
7
+ encoded = tokenizer(prompt, return_tensors="pt").to("cpu")
8
+ input_ids = encoded["input_ids"]
9
+ output = model(input_ids=input_ids)
10
+ shift_labels = input_ids[..., 1:].contiguous()
11
+ shift_logits = output.logits[..., :-1, :].contiguous()
12
+ log_probs = []
13
+ log_probs.append((tokenizer.decode(input_ids[0].tolist()[0]), None))
14
+ for idx, (label_id, logit) in enumerate(zip(shift_labels[0].tolist(), shift_logits[0])):
15
+ logprob = F.log_softmax(logit, dim=0).tolist()[label_id]
16
+ log_probs.append((tokenizer.decode(label_id), float(logprob)))
17
+ return log_probs
18
+ ```
19
+
20
+ An example call would be:
21
+ ```python
22
+ tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt")
23
+ model = OPTForCausalLM.from_pretrained("facebook/opt")
24
+ prompt = "The horse raced past the barn fell."
25
+ logprobs = logprobs_from_prompt(prompt, tokenizer, model)
26
+
27
+ ```
28
+
29
+ For its derivation and explanation see this [discussion](https://huggingface.co/bigscience/bloom/discussions/89#6321dcc9b97c618f9a5e3dac).