Does this implement flash memory?

#3
by donko - opened

As the title says, I was wondering if this implements the same gradient checkpointing and flash memory system that vicuna uses for 4x the context ?

you mean flash attention?

Sign up or log in to comment