Tanvir1337 (Tanvir)

upvoted an article 7 days ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

By

•

14 days ago

• 22

upvoted an article 10 days ago

Article

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique

By

•

Nov 30, 2023

• 8

upvoted a paper 12 days ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 92

upvoted 4 papers about 1 month ago

upvoted a paper about 2 months ago

ReALM: Reference Resolution As Language Modeling

Paper • 2403.20329 • Published Mar 29 • 20

upvoted 11 papers 3 months ago

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 69

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 566

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Paper • 2402.05424 • Published Feb 8 • 17

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21 • 24

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15 • 14

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 33

GraphCast: Learning skillful medium-range global weather forecasting

Paper • 2212.12794 • Published Dec 24, 2022 • 1

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10 • 16

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Paper • 2402.04833 • Published Feb 7 • 6

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Paper • 2311.16567 • Published Nov 28, 2023 • 21

upvoted a collection 4 months ago

Qwen1.5 GGUF

Collection

GGUF quants for the new Qwen1.5 model (https://qwenlm.github.io/blog/qwen1.5/) • 5 items • Updated Feb 5 • 10

upvoted 3 papers 4 months ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 82

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

I am a Strange Dataset: Metalinguistic Tests for Language Models

Paper • 2401.05300 • Published Jan 10 • 4

upvoted a collection 4 months ago

Best for RP on mobile dGPU

Collection

Models without twee romantic language, absurd bad erotica cliches or low coherence. These models are top of their weight class. • 4 items • Updated Jan 17 • 2

upvoted 6 papers 4 months ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 152

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Paper • 2305.14292 • Published May 23, 2023 • 1

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 24

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Paper • 2305.05084 • Published May 8, 2023 • 1

LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4 • 50

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5 • 38

upvoted 16 papers 5 months ago

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 173

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

Paper • 2312.16171 • Published Dec 26, 2023 • 30

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

Paper • 2311.13534 • Published Nov 22, 2023 • 3

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 26

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

Paper • 2308.03109 • Published Aug 6, 2023 • 1

Holistic Evaluation of Language Models

Paper • 2211.09110 • Published Nov 16, 2022 • 1

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 44

Osprey: Pixel Understanding with Visual Instruction Tuning

Paper • 2312.10032 • Published Dec 15, 2023 • 4

Silkie: Preference Distillation for Large Visual Language Models

Paper • 2312.10665 • Published Dec 17, 2023 • 10

VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 20

LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 52

Pretraining on the Test Set Is All You Need

Paper • 2309.08632 • Published Sep 13, 2023 • 3

VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 18

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Paper • 2312.04724 • Published Dec 7, 2023 • 18

Hyena Hierarchy: Towards Larger Convolutional Language Models

Paper • 2302.10866 • Published Feb 21, 2023 • 6

upvoted 9 papers 6 months ago

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Paper • 2312.02119 • Published Dec 4, 2023 • 1

Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 78

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Paper • 2311.16079 • Published Nov 27, 2023 • 18

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 45

Scalable Extraction of Training Data from (Production) Language Models

Paper • 2311.17035 • Published Nov 28, 2023 • 4

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 41

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 171

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 25

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 68

upvoted a collection 6 months ago

Responsible AI resources

Collection

These are the resources I use and mention in my talks & workshops, for more check hf.co/ethics • 13 items • Updated 17 minutes ago • 3

upvoted 4 papers 6 months ago

ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge

Paper • 2303.14070 • Published Mar 24, 2023 • 8

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Paper • 2306.15794 • Published Jun 27, 2023 • 16

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 74

Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation

Paper • 1811.09393 • Published Nov 23, 2018 • 1

Tanvir

AI & ML interests

Organizations

Tanvir1337's activity

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique