zhihan1996/DNABERT-2-117M · Inference fails with output_all_encoded

I am trying to extract hidden layer output from all the layers in the model. As per the documentation, the output_all_encoded_layers: boolean which controls the content of the encoded_layers output as described below. Default: True.. However Line 586 (https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py#L586) has this set to False, which I was expecting to be the case in contrast to what documentation says because only last layer was returned in the output. However, when I set it to True the inference fails. The traceback is as follows:

RuntimeError Traceback (most recent call last)
Cell In[60], line 1
----> 1 output = model(**b, output_all_encoded_layers=True)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:616, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]
--> 616 pooled_output = self.pooler(
617 sequence_output) if self.pooler is not None else None
618 else:
619 # TD [2022-03-01]: the indexing here is very tricky.
620 attention_mask_bool = attention_mask.bool()

File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:501, in BertPooler.forward(self, hidden_states, pool)
495 def forward(self,
496 hidden_states: torch.Tensor,
497 pool: Optional[bool] = True) -> torch.Tensor:
498 # We "pool" the model by simply taking the hidden state corresponding
499 # to the first token.
500 first_token_tensor = hidden_states[:, 0] if pool else hidden_states
--> 501 pooled_output = self.dense(first_token_tensor)
502 pooled_output = self.activation(pooled_output)
503 return pooled_output

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x5 and 768x768)

Steps to reproduce the error:
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
b = tokenizer('ATCG', return_tensors='pt', return_attention_mask=True)
output = model(**b, output_all_encoded_layers=True)

P.S. I am not using triton since it was failing in another step.

zhihan1996
/

DNABERT-2-117M

Inference fails with output_all_encoded_layers=True.