Commit History

fix(modeling_phi): Fixes cached generation when above maximum context length.
ecfe56e

gugarosa commited on

Fixes exceeding maximum sequence length when using generate().
759d148

gugarosa commited on

Uses native torch decorator for disabling autocast.
5819d04

gugarosa commited on

Adds disable_autocast support for different device types.
67ecc75

gugarosa commited on

Fixes any potential overflow when calculating attention weights.
b5c5161

gugarosa commited on

Delete modeling_mixformer_sequential.py
470e18a

gugarosa commited on

Delete configuration_mixformer_sequential.py
bd98e4e

gugarosa commited on

Upload pytorch_model.bin
34b22f4

gugarosa commited on

Update to new model interface.
bbace88

gugarosa commited on

Improves type hinting on configuration arguments.
8d2c4ce

gugarosa commited on

Fixes flash-attn import with a try/except statement
9ed5987

gugarosa commited on

Adds support for flash-attn rotary embedding and fused dense layers.
90c38d9

gugarosa commited on

Adds support for MQA/GQA and attention mask during training / fine-tuning.
371fd51

gugarosa commited on

Upload modeling_mixformer_sequential.py
633bca1

gugarosa commited on

Upload README.md
769684a

gugarosa commited on

fix(phi-1): Checks length of `attention_mask`if it is passed as direct tensor.
1f890f7

gugarosa commited on

Support for `attention_mask` in forward pass.
d22f35e

gugarosa commited on

Upload MixFormerSequentialForCausalLM
44cca9f

suriyagunasekar commited on

Upload MixFormerSequentialForCausalLM
e96b200

suriyagunasekar commited on

Upload MixFormerSequentialForCausalLM
0f4ae0e

suriyagunasekar commited on