Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.10200

Papers - Chain of Thought

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21 • 50
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Paper • 2402.12875 • Published Feb 20 • 2

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 102
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 57
Do language models plan ahead for future tokens?

Paper • 2404.00859 • Published Apr 1 • 2

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 98
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 31
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 37

💡 Interpretability

Linearity of Relation Decoding in Transformer Language Models

Paper • 2308.09124 • Published Aug 17, 2023 • 2
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 99

Must Reads On Transformers and Diffusers

Explore the cutting-edge of AI with our curated list of must reads on Transformers & Diffusers, driving innovation in generative-AI and beyond.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 4
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 34
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper • 2005.11401 • Published May 22, 2020 • 11
Language Model Evaluation Beyond Perplexity

Paper • 2106.00085 • Published May 31, 2021

Papers - Decoders - CoT Decoding

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90

Papers - Observability and Interpretability

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Paper • 2310.00535 • Published Oct 1, 2023 • 2
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Paper • 2211.00593 • Published Nov 1, 2022 • 2
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30 • 18
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 9

Papers - Pre-training

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 15
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Paper • 2403.15042 • Published Mar 22 • 24
LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 17

Research Papers

Research papers related to NLP.

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 34
Self-Attention with Relative Position Representations

Paper • 1803.02155 • Published Mar 6, 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 11
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Paper • 2401.12954 • Published Jan 23 • 28

paper_collection

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 565
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 58
Simple and Scalable Strategies to Continually Pre-train Large Language Models

Paper • 2403.08763 • Published Mar 13 • 48
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 85

Previous
1
2
3
...
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs