Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.06825

Must-read Papers

Some of the most important and insightful (in my opinion) AI papers of today, with a focus on NLP and LLMs.

ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 13
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 40
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 100

about 24 hours ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 40
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 12
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 10

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45

Papers of intereset

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45
Instruction Tuning with Human Curriculum

Paper • 2310.09518 • Published Oct 14, 2023 • 3
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 65
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20 • 24

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45

test collection

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 42
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 50
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 11
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 26
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 40
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 40
Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 34

language-models

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45
BloombergGPT: A Large Language Model for Finance

Paper • 2303.17564 • Published Mar 30, 2023 • 18
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 12

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5 • 39
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 32
GPT-4 Technical Report

Paper • 2303.08774 • Published Mar 15, 2023 • 3
Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 45

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs