This model can be used normally using oobabooga

#5
by SongXiaoMao - opened

After 8bit, the loading of models with graphics memory usage is similar to that of 7B's fp16. The conversation time is around 17GB, but the speed is not as slow as 7B's. It seems that we need to consider replacing the hardware. Thank you for providing the 8bit model

Sign up or log in to comment