Collections

4

A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 21
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 3
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5 • 10
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22 • 10

3

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 50
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 46
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 126
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 17

-

1

A Survey on Language Models for Code

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Beyond Language Models: Byte Models are Digital World Simulators

StarCoder 2 and The Stack v2: The Next Generation

Simple linear attention language models balance the recall-throughput tradeoff

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Video as the New Language for Real-World Decision Making

Design2Code: How Far Are We From Automating Front-End Engineering?

Ultralytics/YOLOv8

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

bigcode/starcoder2-15b

Zephyr: Direct Distillation of LM Alignment

mixedbread-ai/mxbai-rerank-large-v1

LoRA+: Efficient Low Rank Adaptation of Large Models

The FinBen: An Holistic Financial Benchmark for Large Language Models

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

TrustLLM: Trustworthiness in Large Language Models

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Linear Transformers are Versatile In-Context Learners

Training-Free Long-Context Scaling of Large Language Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Design2Code: How Far Are We From Automating Front-End Engineering?

LLM Agent Operating System

Self-Rewarding Language Models

ReFT: Reasoning with Reinforced Fine-Tuning

Tuning Language Models by Proxy

TrustLLM: Trustworthiness in Large Language Models

DocLLM: A layout-aware generative language model for multimodal document understanding

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Weaver: Foundation Models for Creative Writing

Efficient Tool Use with Chain-of-Abstraction Reasoning