Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.00908

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Paper • 2311.07590 • Published Nov 9, 2023 • 16
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 117
AppAgent: Multimodal Agents as Smartphone Users

Paper • 2312.13771 • Published Dec 21, 2023 • 51
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177

interesting paper

FMViT: A multiple-frequency mixing Vision Transformer

Paper • 2311.05707 • Published Nov 9, 2023 • 5
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 79

meta-llama/Llama-2-7b-hf

Text Generation • Updated Apr 17 • 1.13M • 1.64k
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 19
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models

Paper • 2402.01118 • Published Feb 2 • 28

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 7
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47

UI Layout Generation with LLMs Guided by UI Grammar

Paper • 2310.15455 • Published Oct 24, 2023 • 2
You Only Look at Screens: Multimodal Chain-of-Action Agents

Paper • 2309.11436 • Published Sep 20, 2023 • 1
Never-ending Learning of User Interfaces

Paper • 2308.08726 • Published Aug 17, 2023 • 1
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 63

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 14
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Visual Document Understanding / OCR

LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 63
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1
On the Hidden Mystery of OCR in Large Multimodal Models

Paper • 2305.07895 • Published May 13, 2023
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177

Running

1

🏢

DocumentClassification
Runtime error

🏢

Document_classifier
Runtime error

25

🔥

Donut Rvlcdip
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177

LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 63
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177
mPLUG/DocOwl1.5-Omni

Updated Apr 10 • 646 • 15

Running

211

😻

Mistral Super Fast
Runtime error

78

📚

Qwen 14b Chat Demo
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 177
Running

381

🚀

Qwen1.5 72B Chat

Previous
1
...
8
9
10
11
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs