`long_factor` is never used?
#22
by
J22
- opened
long_factor
is never used, since inv_freq
is likely to be always initialized by short_factor
. Is there anything wrong?
@torch
.no_grad()
def forward(self, x, position_ids, seq_len=None):
position_ids_expanded = position_ids[:, None, :].float()
if position_ids_expanded.shape[-1] > self.original_max_position_embeddings:
ext_factors = torch.tensor(self.long_factor, dtype=torch.float32, device=x.device)
else:
ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)
if self.inv_freq is None:
self.inv_freq = 1.0 / (
ext_factors
* self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim)
)
inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
Sorry about that, we are addressing the required changes for fully integrating in transformers
and we missed this. It is fixed now and will be improved to be cached later on.
gugarosa
changed discussion status to
closed
This is still confusion. Suppose when this is called for the first time, 5000 tokens are passed in , then, long_factor
is also used the first 4096 tokens. Is this intentional?
@torch
.no_grad()
def forward(self, x, position_ids, seq_len=None):
seq_len = torch.max(position_ids) + 1
if seq_len > self.original_max_position_embeddings:
ext_factors = torch.tensor(self.long_factor, dtype=torch.float32, device=x.device)
else:
ext_factors = torch.tensor(self.short_factor, dtype=torch.float32, device=x.device)