Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.06825

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 36
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 6
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 153
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 43

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 45
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 43
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 30
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 9

Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 43

about 1 month ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 36
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper • 2005.11401 • Published May 22, 2020 • 11
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 24
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 8

Stuff I (TheProjectsGuy) have summarized (for time pass). Mostly papers. I do not guarantee that the summaries are fully correct (as I am no expert).

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Paper • 2402.02519 • Published Feb 4
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 153
Optimal Transport Aggregation for Visual Place Recognition

Paper • 2311.15937 • Published Nov 27, 2023
GOAT: GO to Any Thing

Paper • 2311.06430 • Published Nov 10, 2023 • 14

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 567
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 153
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 43
Don't Make Your LLM an Evaluation Benchmark Cheater

Paper • 2311.01964 • Published Nov 3, 2023 • 1

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 82
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 17
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 53
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

Paper • 2310.06147 • Published Oct 9, 2023 • 1

FreeU: Free Lunch in Diffusion U-Net

Paper • 2309.11497 • Published Sep 20, 2023 • 63
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 50
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 82
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 43

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 8
RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 7
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 10

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs