Error index out of range in model vietnamese-embedding for training with sentence-transformers

#2
by minhquy1624 - opened

Cell In[23], line 1
----> 1 model.fit(
2 train_objectives=[(train_dataloader, train_loss)],
3 evaluator=dev_evaluator,
4 epochs=30,
5 warmup_steps=1000,
6 output_path=model_save_path,
7 save_best_model =True,
8 show_progress_bar =True
9 )

File /workspace/nlplab/nmq/env_nmq/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py:1035, in SentenceTransformer.fit(self, train_objectives, evaluator, epochs, steps_per_epoch, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar, checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit)
1033 skip_scheduler = scaler.get_scale() != scale_before_step
1034 else:
-> 1035 loss_value = loss_model(features, labels)
1036 loss_value.backward()
1037 torch.nn.utils.clip_grad_norm_(loss_model.parameters(), max_grad_norm)

File /workspace/nlplab/nmq/env_nmq/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
...
2208 # remove once script supports set_grad_enabled
2209 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self

@minhquy1624 please put the max_seq_length <=512 tokens, you should segment your text.

I have segment and config input to 256 tokens but the above error still occurs. I fine tune with sentence-transformers.

Sign up or log in to comment