-
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Paper • 2405.07990 • Published • 16 -
Large Language Models as Planning Domain Generators
Paper • 2405.06650 • Published • 9 -
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Paper • 2404.12753 • Published • 41 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2401.16467
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 8 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 47 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 6
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 24 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 22
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 69 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 73
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 8 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 134 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 10
-
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Paper • 2311.06720 • Published • 7 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 37 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 69 -
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Paper • 2311.02303 • Published • 4 -
A Survey on Language Models for Code
Paper • 2311.07989 • Published • 21 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 79