Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.10768

LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 13
FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 27
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16

Papers to read - General

Papers I want to read, at some point.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Paper • 2108.12409 • Published Aug 27, 2021 • 5
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
MIMIC-IT: Multi-Modal In-Context Instruction Tuning

Paper • 2306.05425 • Published Jun 8, 2023 • 11
Music ControlNet: Multiple Time-varying Controls for Music Generation

Paper • 2311.07069 • Published Nov 13, 2023 • 43

Reasoning | Planning

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 7
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Paper • 2310.19019 • Published Oct 29, 2023 • 9
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 10
Thread of Thought Unraveling Chaotic Contexts

Paper • 2311.08734 • Published Nov 15, 2023 • 6

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

Paper • 2204.04581 • Published Apr 10, 2022 • 1
Retrieval-Augmented Multimodal Language Modeling

Paper • 2211.12561 • Published Nov 22, 2022 • 1
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

Paper • 2212.10511 • Published Dec 20, 2022 • 1
Memorizing Transformers

Paper • 2203.08913 • Published Mar 16, 2022 • 2

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Paper • 2303.06182 • Published Mar 10, 2023 • 1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

Paper • 2112.14397 • Published Dec 29, 2021 • 1

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Paper • 2309.03550 • Published Sep 7, 2023 • 11
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 183
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

Paper • 2311.12631 • Published Nov 21, 2023 • 13

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs