Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
Fast-RAG Inference Endpoints Collection An extremely easy to deploy RAG Pipeline using Inference Endpoints • 3 items • Updated Jun 3 • 1
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22 • 59
Neural Network Compression & Quantization Collection Tracks papers and links about neural network compression and quantization technics • 4 items • Updated Sep 22, 2023 • 1
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models Paper • 2306.15626 • Published Jun 27, 2023 • 17