|
# Microkernel naming conventions |
|
|
|
This documents deciphers XNNPACK's microkernels naming convention. |
|
|
|
## General conventions |
|
|
|
Microkernel function names follow this convention: |
|
|
|
`xnn_<datatype>_<microkernel><activation?>_ukernel_<parameters>__<arch>` |
|
|
|
Where `<datatype>` can be: |
|
|
|
- `cs16` |
|
- `f16` - 16-bit half precision float |
|
- `f32` - 32-bit single precision float |
|
- `qc8` |
|
- `qs8` - quantized signed 8 bit |
|
- `qu8` - quantized unsigned 8 bit |
|
- `s16` |
|
- `u32` |
|
- `x8` |
|
- `x16` |
|
- `x24` |
|
- `x32` |
|
- `xx` |
|
|
|
`<microkernel>` is the type of microkernel, such as: |
|
|
|
- `gemm` |
|
- `igemm` |
|
- `avgpool` |
|
|
|
`<activation>` if supported for the microkernel is activation that is fused into |
|
the microkernel: |
|
|
|
- `linear` |
|
- `minmax` |
|
- `relu` |
|
|
|
`<parameters>` are microkernel specific, and can mean different things depending |
|
on the microkernel (see below for details). |
|
|
|
`<arch>` is the architecture the microkernel is optimized for, and can contain |
|
further subdivisions for additional instruction sets supported on the specified |
|
architecture, or processor information: |
|
|
|
- `scalar` |
|
- `aarch32_neon_cortex_a55` |
|
- `neonv8_mlal` |
|
- `wasm` |
|
- `avx512` |
|
- `avx512skx` |
|
|
|
## GEMM and IGEMM microkernels |
|
|
|
The `<parameters>` for GEMM and IGEMM microkernels represent the `mr` and `nr` |
|
of the microkernel. You can think of it as the number of rows and columns of the |
|
output calculated by the microkernel. |
|
|
|
E.g. `xnn_f32_gemm_minmax_ukernel_4x8__aarch32_neon_cortex_a7` processes 32 |
|
elements of the output matrix. |
|
|
|
## DWCONV microkernels |
|
|
|
These microkernels come in 2 varieties, uni-pass and multi-pass. |
|
|
|
Uni-pass have `XpYc` in their name, where `X` is the kernel tile, and `Y` is the |
|
channel tile. `p` stands for primary, `c` for channel. |
|
|
|
Multi-pass have `UfVmWlXcYsZr` in their name, where `U` is the first pass tile, |
|
`V` is the middle pass tile, `W` is the last pass tile, `X` is the channel tile, |
|
`Y` is the channel subtile, and `Z` is the channel round. `f` stands for first, |
|
`m` for middle, `l` for last, `c` for channel, `s` for subtile, `r` for round. |
|
The kernel size must be at least `W+1`, the middle pass runs for as many |
|
iterations as possible, and the last pass handles the remainder (at least 1). |
|
`c`, `s`, `r`, affects the tiling of channels. We run as many tiles of `c` as |
|
possible, followed by rounds of `s`. We determine how many tiles of `c` to run |
|
based on rounding the number of channels up to `r`. `r` is determined based on |
|
the natural tiling size of the microarchitecture (e.g. SSE/AVX) and the number |
|
of elements we can read OOB (`XNN_EXTRA_BYTES`). |
|
|
|
## Average Pooling and Global Average Pooling |
|
|
|
These microkernels come in 2 varieties, uni-pass and multi-pass. |
|
|
|
Uni-pass have `Cx` in their name, where `C` is a number. This microkernel |
|
processes up to and including `C` elements. |
|
|
|
Multi-pass have `CpDx` in their name, where `C` and `D` are numbers. This |
|
microkernel processes `D` elements in the first pass, and middle pass (which can |
|
run multiple times), and up to `C` elements in the last pass. |
|
|
|
E.g. `xnn_f32_avgpool_minmax_ukernel_9x__neon_c4` can process up to 9 elements. |
|
|