Spaces:
Runtime error
Runtime error
harmdevries
commited on
Commit
•
6c2da96
1
Parent(s):
c88286f
Update app.py
Browse files
app.py
CHANGED
@@ -185,10 +185,10 @@ st.markdown("where BW_math is the number of floating point operations per second
|
|
185 |
st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
|
186 |
st.latex("max(T_{math}, T_{mem})")
|
187 |
|
188 |
-
st.markdown("
|
189 |
|
190 |
st.subheader("Inference time for Transformer operations")
|
191 |
-
st.
|
192 |
|
193 |
st.subheader('Attention layer')
|
194 |
|
|
|
185 |
st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
|
186 |
st.latex("max(T_{math}, T_{mem})")
|
187 |
|
188 |
+
st.markdown("Note that there is a minimum time to execute the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
|
189 |
|
190 |
st.subheader("Inference time for Transformer operations")
|
191 |
+
st.markdown("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
|
192 |
|
193 |
st.subheader('Attention layer')
|
194 |
|