Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.12983

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5 • 11
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Paper • 2305.01210 • Published May 2, 2023 • 4
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

Paper • 2309.06495 • Published Sep 5, 2023 • 1
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

Paper reading list

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79
LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 54
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 257

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 32
Scaling Laws of Synthetic Images for Model Training ... for Now

Paper • 2312.04567 • Published Dec 7, 2023 • 7
Large Language Models for Mathematicians

Paper • 2312.04556 • Published Dec 7, 2023 • 11
LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Paper • 2312.03079 • Published Dec 5, 2023 • 12

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 19
HuggingFaceH4/mt_bench_prompts

Viewer • Updated Jul 3, 2023 • 80 • 476 • 16
vectara/hallucination_evaluation_model

Text Classification • Updated Oct 30 • 278k • 235
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

Levels of AGI: Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 34
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Paper • 2206.04615 • Published Jun 9, 2022 • 5
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 42
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Paper • 2306.13651 • Published Jun 23, 2023 • 15

Previous
1
2
3
4
5
...
8
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs