llms for me - a maxinwalk Collection

maxinwalk 's Collections

robot

llms for me

updated Nov 27

Evaluating Very Long-Term Conversational Memory of LLM Agents

Paper • 2402.17753 • Published Feb 27 • 18
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26 • 26
Do Large Language Models Latently Perform Multi-Hop Reasoning?

Paper • 2402.16837 • Published Feb 26 • 24
Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22 • 22
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19 • 18
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23 • 14
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22 • 82
AgentScope: A Flexible yet Robust Multi-Agent Platform

Paper • 2402.14034 • Published Feb 21 • 12
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22 • 10
User-LLM: Efficient LLM Contextualization with User Embeddings

Paper • 2402.13598 • Published Feb 21 • 19
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21 • 12
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20 • 11
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

Paper • 2402.10963 • Published Feb 13 • 10
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16 • 30
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling

Paper • 2402.10466 • Published Feb 16 • 17
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16 • 10
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 104
ReGAL: Refactoring Programs to Discover Generalizable Abstractions

Paper • 2401.16467 • Published Jan 29 • 9

Note 增加代码模板可以提高代码生成的准确性和稳定性
Capture the Flag: Uncovering Data Insights with Large Language Models

Paper • 2312.13876 • Published Dec 21, 2023 • 1
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65
Recourse for reclamation: Chatting with generative language models

Paper • 2403.14467 • Published Mar 21 • 6
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models

Paper • 2403.15157 • Published Mar 22 • 7
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 44
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 56
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 24
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 54
Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13 • 27
How Far Can We Go with Practical Function-Level Program Repair?

Paper • 2404.12833 • Published Apr 19 • 6
INDUS: Effective and Efficient Language Models for Scientific Applications

Paper • 2405.10725 • Published May 17 • 32
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11 • 27
LLMs Do Not Think Step-by-step In Implicit Reasoning

Paper • 2411.15862 • Published Nov 24 • 8