Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.04088

Mixture of Experts

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8 • 68
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 36

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 25
CsFEVER and CTKFacts: Acquiring Czech data for fact verification

Paper • 2201.11115 • Published Jan 26, 2022
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 12
FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 26

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 39
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 6
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 46
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 30
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 9

Papers: MoE/Ensemble

Papers related to Mixture of Experts topics.

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 4
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 12
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Paper • 2310.03094 • Published Oct 4, 2023 • 12

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Paper • 2310.03214 • Published Oct 5, 2023 • 14
HeaP: Hierarchical Policies for Web Actions using LLMs

Paper • 2310.03720 • Published Oct 5, 2023 • 5
Large Language Models Cannot Self-Correct Reasoning Yet

Paper • 2310.01798 • Published Oct 3, 2023 • 31
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154

Stuff I (TheProjectsGuy) have summarized (for time pass). Mostly papers. I do not guarantee that the summaries are fully correct (as I am no expert).

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Paper • 2402.02519 • Published Feb 4
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Optimal Transport Aggregation for Visual Place Recognition

Paper • 2311.15937 • Published Nov 27, 2023
GOAT: GO to Any Thing

Paper • 2311.06430 • Published Nov 10, 2023 • 14

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 17
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Paper • 2310.08678 • Published Oct 12, 2023 • 11
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 237
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 11

about 21 hours ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 575
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45
Don't Make Your LLM an Evaluation Benchmark Cheater

Paper • 2311.01964 • Published Nov 3, 2023 • 1

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 37
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 73
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 77
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 81

Previous
1
...
3
4
5
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs