Modify model architecture to export ONNX 11

#7
by Jackmin108 - opened
No description provided.

can we get this merged? i am running in to:
[libprotobuf ERROR ../third_party/protobuf/src/google/protobuf/message_lite.cc:457] onnx_torch.ModelProto exceeded maximum protobuf size of 2GB: 3768728061

when exporting to onnx

We have decided not to merge this into the main branch as it

  1. Disables flash attention as default -- This would cause our users who are running pytorch to experience lower throughput and higher memory footprint with the default settings
  2. Dynamically allocates alibi tensor -- This might be an issue for long running server deployments as the alibi tensor is quite big at long seq len (4GB at 8k seq len). This big dynamic tensor allocation could cause an OOM due to memory fragmentation.

You can still make the changes here to your local version of the model to be able to do the export.

However, our recommendation is to just use the models we have already exported:

Jackmin108 changed pull request status to closed

@Jackmin108 I'm currently using the published onnx models that you linked to. I downloaded that and applied O4 optimization via the optimum SDK. What i'm noticing with this vs my other O4 optimized onnx model ~2 months ago is that this one OOMs usually during model creation, about 1/4 of the time. Do you have an idea why?

Sign up or log in to comment