view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • 14 days ago • 22
view article Article Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique By lyogavin • Nov 30, 2023 • 8
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 92
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published Apr 16 • 12
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 69
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 566
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures Paper • 2402.05424 • Published Feb 8 • 17
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 24
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Paper • 2402.10128 • Published Feb 15 • 14
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33
GraphCast: Learning skillful medium-range global weather forecasting Paper • 2212.12794 • Published Dec 24, 2022 • 1
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10 • 16
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning Paper • 2402.04833 • Published Feb 7 • 6
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 21
Qwen1.5 GGUF Collection GGUF quants for the new Qwen1.5 model (https://qwenlm.github.io/blog/qwen1.5/) • 5 items • Updated Feb 5 • 10
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
I am a Strange Dataset: Metalinguistic Tests for Language Models Paper • 2401.05300 • Published Jan 10 • 4
Best for RP on mobile dGPU Collection Models without twee romantic language, absurd bad erotica cliches or low coherence. These models are top of their weight class. • 4 items • Updated Jan 17 • 2
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia Paper • 2305.14292 • Published May 23, 2023 • 1
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning Paper • 2401.01325 • Published Jan 2 • 24
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition Paper • 2305.05084 • Published May 8, 2023 • 1
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5 • 38
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 173
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 30
LM-Cocktail: Resilient Tuning of Language Models via Model Merging Paper • 2311.13534 • Published Nov 22, 2023 • 3
Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code Paper • 2308.03109 • Published Aug 6, 2023 • 1
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 44
Osprey: Pixel Understanding with Visual Instruction Tuning Paper • 2312.10032 • Published Dec 15, 2023 • 4
Silkie: Preference Distillation for Large Visual Language Models Paper • 2312.10665 • Published Dec 17, 2023 • 10
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper • 2101.03961 • Published Jan 11, 2021 • 13
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models Paper • 2312.04724 • Published Dec 7, 2023 • 18
Hyena Hierarchy: Towards Larger Convolutional Language Models Paper • 2302.10866 • Published Feb 21, 2023 • 6
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Paper • 2312.02119 • Published Dec 4, 2023 • 1
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Paper • 2311.16079 • Published Nov 27, 2023 • 18
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 45
Scalable Extraction of Training Data from (Production) Language Models Paper • 2311.17035 • Published Nov 28, 2023 • 4
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs Paper • 2311.13600 • Published Nov 22, 2023 • 41
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Paper • 2311.10122 • Published Nov 16, 2023 • 25
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 68
Responsible AI resources Collection These are the resources I use and mention in my talks & workshops, for more check hf.co/ethics • 13 items • Updated 17 minutes ago • 3
ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge Paper • 2303.14070 • Published Mar 24, 2023 • 8
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution Paper • 2306.15794 • Published Jun 27, 2023 • 16
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 74
Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation Paper • 1811.09393 • Published Nov 23, 2018 • 1