Handle Batch Sizes (v0.2)

#32

Original Post (#29 v0.1)
I hope I'm not overstepping boundaries. I tried the fix on #16 to handle batch sizes larger than 1 but stumbled upon a few issues. For example, issues processing single strings as the text input or not handling padding when the input texts are of different size.

I tried to address all the issues I had. Hope this can be of use.

@gugarosa feel free to have a look and let me know if I'm overlooking something.

Best,
William

Update (v0.2)
Added self.tokenizer.padding_side = 'left' on line 55 to handle only-text inputs.

I'll try to tag the users that seemed interested on the batch size issue:
@haipingwu
@sebbyjp
@reichenbachian (saw your comment on #16, hope this helps)

Hey is this tested somewhere?

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment