Integration in transformers lib.

#27
by sudhir2016 - opened

When do you plan to integrate in transformers lib as a pipeline function ?

Microsoft org
β€’
edited Sep 22, 2023

On behalf of the transformers team, we'd be happy to help with the integration within the library if there is desire from @gugarosa or @suriyagunasekar πŸ€—

Hi, I am currently working on this integration, PR. :)

Thank you !!

Will it support fine-tuning these models, such as phi-1 and phi-1.5?

Currently, during my finetuning, I encountered this warning

`attention_mask` is not supported during training. Using it might lead to unexpected results.
{'loss': 1.3228, 'learning_rate': 1.999875577156579e-05, 'epoch': 0.02}
  1%|▍                                                                                             | 300/59745 [06:19<20:47:29,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 301/59745 [06:20<20:48:14,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 302/59745 [06:22<20:48:01,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 303/59745 [06:23<20:47:31,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 304/59745 [06:24<20:48:13,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 305/59745 [06:25<20:49:27,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 306/59745 [06:27<20:48:52,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 307/59745 [06:28<20:48:29,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 308/59745 [06:29<20:49:14,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
  1%|▍                                                                                             | 309/59745 [06:30<20:49:49,  1.26s/it]`attention_mask` is not supported during training. Using it might lead to unexpected results.
{'loss': 1.5263, 'learning_rate': 1.9998671442394832e-05, 'epoch': 0.02}

Hi @SinclairWang , yes it will support attention_mask, so you won't get this warning.

Microsoft org

Hello @SinclairWang ! Until phi is fully integrated in transformers, we added support for training/fine-tuning with attention mask in the files located in this repository.

You should not get the warning anymore if using the latest revision.

gugarosa changed discussion status to closed

Sign up or log in to comment