Post
469
I've shared Hugging Face Spaces for CPU-based RAG and T5/Flan-T5 models. The smolagents-rag space sometimes produces high-quality answers, but it can be slow. Qwen2.5-0.5B is as fast as a CPU implementation and generates answers of acceptable quality. I've found that Gemma3-4B produces significantly more stable answers than the 1B version.
Rag
Akjava/Gemma3-4B-llamacpp-cpu-rag-smolagents
Akjava/Qwen2.5-0.5B-Rag-Thinking-Flan-T5
t5/flan-t5
Akjava/llamacpp-flan-t5-large-grammar-synthesis
Akjava/llamacpp-madlad400-3b-mt-2jp
Huggingface Free CPU Limitations
When duplicating a space, the build process(llama-cpp-python) can occasionally become stuck, requiring a manual restart to finish.
Spaces may unexpectedly stop functioning or even be deleted, leading to the need to rework them. Refer to issue for more information.
Rag
Akjava/Gemma3-4B-llamacpp-cpu-rag-smolagents
Akjava/Qwen2.5-0.5B-Rag-Thinking-Flan-T5
t5/flan-t5
Akjava/llamacpp-flan-t5-large-grammar-synthesis
Akjava/llamacpp-madlad400-3b-mt-2jp
Huggingface Free CPU Limitations
When duplicating a space, the build process(llama-cpp-python) can occasionally become stuck, requiring a manual restart to finish.
Spaces may unexpectedly stop functioning or even be deleted, leading to the need to rework them. Refer to issue for more information.