zhihan1996/DNABERT-2-117M · Assertion Error / Implementation Error

Following is the code I used:
##1

import torch
from transformers import AutoTokenizer, AutoModel
dna_sequence = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
if torch.cuda.is_available():
  print('Moving to GPU')
  encoded_sequence = encoded_sequence.to('cuda') #### USING CUDA
with torch.no_grad():
  hidden_states = model(**encoded_sequence)[0]

With the above code I get the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Then I changed to CPU.
##2

import torch
from transformers import AutoTokenizer, AutoModel
dna_sequence = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
if torch.cuda.is_available():
  print('Moving to GPU')
  encoded_sequence = encoded_sequence.to('cpu') #### USING cpu
with torch.no_grad():
  hidden_states = model(**encoded_sequence)[0]

With the above code I get the following error:

AssertionError Traceback (most recent call last)
in <cell line: 9>()
7 encoded_sequence = encoded_sequence.to('cpu')
8
----> 9 hidden_states = model(**encoded_sequence)[0]

17 frames
~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/dd10f74f0e90735d02a27603e56467761893e8f9/flash_attn_triton.py in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
783

AssertionError: