This is a QCQA version of the original model facebook/opt-125m. In this version, the original MHA architecture is preserved but instead of having a single K/V head, different K/V heads corresponding to the same group have the same mean-pooled K or V values. It has 16 groups of KV heads per layer instead of original 32 KV heads in the MHA implementation.

Downloads last month
59
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.