## Model max length #2

by
victor-roris
- opened

I tried your code with long sentences and the automatic truncation to the max length of the model fails:

```
encoded_input = tokenizer(really_long_sentence, truncation=True, max_length=None, return_tensors='pt')
```

It raises an error about the dimension:

```
RuntimeError: The expanded size of the tensor (5227) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 5227]. Tensor sizes: [1, 514]
```

So, I tried to fix the max length to 514:

```
encoded_input = tokenizer(really_long_sentence, truncation=True, max_length=514, return_tensors='pt')
```

But it continues failing:

```
2041 # remove once script supports set_grad_enabled
2042 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2043 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2044
2045
IndexError: index out of range in self
```

Can you tell me if there is some way to obtain the appropriate model max length from the model/tokenizer configuration?

I had a similar issue. I found (through trial and error) that if you set the max_length to `511`

, it seems to work.

I'd like to understand why thats the case though

I had the same issue, but if you choose `max_length`

option way less than existing size **514**, it would solve your problem. Try ** max_length=500** or

**and see if it works.**

`max_length=400`