token recast to torch.longTensor

#1
by dyoung - opened

Hello,

I was looking through the model card and in the quick use code example I noticed in the generate_text function something that caught my curiosity.
After the tokenization of the input prompt intended to be going to the model in GPU("tokens = tokenizer.encode(instruction)"), the tokens are recast as longTensors (64-bit signed interreges tensors recast at "tokens = torch.LongTensor(tokens).unsqueeze(0)").
I've not seen a lot of others doing this with what I've seen so far in my Ai journey. I was curious as to what the reasoning why. I can speculate several reasons why. I figure it wouldn't hurt if I ask directly. I'll also be looking online. If you can, could you point me at any material I can look at that further supports why it's smart to recast a tensor before sending off to the GPU, that would be appreciated. If you can't or do not want to, that is understandable.

Thank you for your time.

I think I may of answered what is going on. I'm seeing that "tokens = tokenizer.encode(instruction)" from the example code returns a common built in python list object/class. Which obviously doesn't have the ability to be sent to the GPU as is (No ".to("cuda") method for the list class ...). So the recast with a pytorch long tensor is done in order to be able to copy the prompts tokens over to the GPU. And which is the class/data type expected for inference needs.
What I see more common is that people are using 'tokens = tokenizer(sentence, return_tensors="pt").to(device)'. Where a single statement is doing multiple steps in one line. So it looks like the example code is just breaking down the parts into different statements rather then packing them into one line. Seeing this being done differently caught my attention. And I was curious as to why. I'm happy with this explanation I've found for my self.
This thread could be considered close and I'd be ok with that.

Sign up or log in to comment