Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.11829

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

Paper • 2311.10642 • Published Nov 17, 2023 • 23
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70

Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 68
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11 • 34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Paper • 2311.11315 • Published Nov 19, 2023 • 6
Alignment for Honesty

Paper • 2312.07000 • Published Dec 12, 2023 • 11
Steering Llama 2 via Contrastive Activation Addition

Paper • 2312.06681 • Published Dec 9, 2023 • 11

Some of the Papers I've Read

A few of the research papers that I've read.

Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 34
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 9
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Paper • 2311.08692 • Published Nov 15, 2023 • 12
DiLoCo: Distributed Low-Communication Training of Language Models

Paper • 2311.08105 • Published Nov 14, 2023 • 14
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 2

Interesting SSL papers

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Paper • 2311.02077 • Published Nov 3, 2023 • 14
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 11
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44

HuggingFaceH4/zephyr-7b-beta

Text Generation • Updated 13 days ago • 1.19M • • 1.58k
togethercomputer/RedPajama-Data-V2

Viewer • Updated Jan 18 • 2.25M • 4.12k • 341
01-ai/Yi-6B

Text Generation • Updated 8 days ago • 15.6k • 371
hkunlp/instructor-xl

Sentence Similarity • Updated Jan 21, 2023 • 22.1k • 545

Reasoning | Planning

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 7
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Paper • 2310.19019 • Published Oct 29, 2023 • 9
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 10
Thread of Thought Unraveling Chaotic Contexts

Paper • 2311.08734 • Published Nov 15, 2023 • 6

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs