Replacing models to reduce VRAM
#18
by
Deniaud
- opened
Greetings! To date we know that it is possible to reduce VRAM consumption by using fp8 or GGUF quant Flux version.
But I also wanted to ask you about the possible replacement of Qwen 7B with Qwen 2B, and especially with its quant versions. Is this possible and will it make sense in this case, or will it be at the level of T5 in terms of quality?