ONNX
Whats ONNX
(Open Neural Network Exchange) is open source format (2017) created by Microsoft and Facebook for machine learning models, allows interoperability between different frameworks and enables optimized inference using hardware accelerators. Its used when deploying models for production, running models on different hardware. Builds a static graph making dynamic computation harder.
How does ONNX optimizes computation?
Why does ML frameworks require the graph to be dynamic at each forward pass?
When not to use ONNX
What to do when a Model has dynamic flow?
torch.onnx.export(
model,
dummy_input,
"dynamic_model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {0: "batch_size", 2: "height", 3: "width"},
"output": {0: "batch_size", 2: "height", 3: "width"}
}
)
In pytorch, tensors follow the NCHW format by default for images
(batch_size, channels, height, width)
0 -> Batch size
1 -> Channels
2 -> Height
3 -> Width
dynamic_axes={
"input": {0: "batch_size", 2: "height", 3: "width"},
"output": {0: "batch_size", 2: "height", 3: "width"}
}
"input": {0: "batch_size"} → Batch size can change.
"input": {2: "height", 3: "width"} → Height and width can also vary.
(1, 3, 224, 224), (4, 3, 512, 512), or (8, 3, 300, 400)
What if model is a NLP model?
dynamic_axes={
"input": {0: "batch_size", 1: "seq_length"},
"output": {0: "batch_size", 1: "seq_length"}
}
inputs are usually in (batch_size, sequence_length, embedding_dim) format.
Whats Batch size?
number of input samples processed together in one forward pass of a model. for NLP models, the shape might be (16, sequence_length, embedding_dim) → 16 sentences at a time
Why do we pass dummy input to ONNX export?
ONNX requires tracing to define model’s computational graph. Dummy input provides a reference shape and dtype for ONNX to infer tensor dimensions