Replacing models to reduce VRAM

#18
by Deniaud - opened

Greetings! To date we know that it is possible to reduce VRAM consumption by using fp8 or GGUF quant Flux version.
But I also wanted to ask you about the possible replacement of Qwen 7B with Qwen 2B, and especially with its quant versions. Is this possible and will it make sense in this case, or will it be at the level of T5 in terms of quality?

Sign up or log in to comment