harmdevries commited on
Commit
6c2da96
1 Parent(s): c88286f

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -2
app.py CHANGED
@@ -185,10 +185,10 @@ st.markdown("where BW_math is the number of floating point operations per second
185
  st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
186
  st.latex("max(T_{math}, T_{mem})")
187
 
188
- st.markdown("We also a minimum time for executing the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
189
 
190
  st.subheader("Inference time for Transformer operations")
191
- st.text("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
192
 
193
  st.subheader('Attention layer')
194
 
 
185
  st.markdown("If we assume we can *perfectly* overlap memory access with math operations, then the estimated execution time for the operation is:")
186
  st.latex("max(T_{math}, T_{mem})")
187
 
188
+ st.markdown("Note that there is a minimum time to execute the operation due to [kernel launch overhead](https://forums.developer.nvidia.com/t/any-way-to-measure-the-latency-of-a-kernel-launch/221413/2)")
189
 
190
  st.subheader("Inference time for Transformer operations")
191
+ st.markdown("We can now estimate the execution for each of the operations in the transformer model. I suggest you inspect the code for details on the calculations. ")
192
 
193
  st.subheader('Attention layer')
194