QTIP Quantized Models
Collection
See https://github.com/Cornell-RelaxML/qtip
•
30 items
•
Updated
•
9
This model is compatible with tensor parallelism. The RHT runs per-GPU instead of across GPUs. q, k, v, up, and gate are split along the output channel, and o and down are split along the input channel. This model has slightly worse quality than the non "TP8" model.