Lin Tan
lin-tan
AI & ML interests
AI-Software Synergy. LLM4Code (binary and source code).
Mary J. Elmore New Frontiers Professor
Purdue University
Recent Activity
replied to
their
post
22 days ago
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).
Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)
Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD_Lite
#LLM4code #LLM #CodeGeneration #Security
reacted
to
their
post
with 👍
26 days ago
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471
We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖
💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.
📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.
@yiwu @jiang719
#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
reacted
to
their
post
with 🔥
about 1 month ago
🚀 Excited to share that our paper, "SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models", has been accepted to #ICRA2025! 🔗 Preprint: https://arxiv.org/pdf/2409.19471
We introduce SELP (Safe Efficient LLM Planner), a novel approach for generating plans that adhere to user-specified constraints while optimizing for time-efficient execution. By leveraging linear temporal logic (LTL) to interpret natural language commands, SELP effectively handles complex commands and long-horizon tasks. 🤖
💡SELP presents three key insights:
1️⃣ Equivalence Voting: Ensures robust translations from natural language instructions into LTL specifications.
2️⃣ Constrained Decoding: Uses the generated LTL formula to guide the autoregressive inference of plans, ensuring the generated plans conform to the LTL.
3️⃣ Domain-Specific Fine-Tuning: Customizes LLMs for specific robotic tasks, boosting both safety and efficiency.
📊 Experiment: Our experiments demonstrate SELP’s effectiveness and generalizability across diverse tasks. In drone navigation, SELP outperforms state-of-the-art LLM planners by 10.8% in safety rate and by 19.8% in plan efficiency. For robot manipulation, SELP achieves a 20.4% improvement in safety rate.
@yiwu @jiang719
#ICRA2025 #LLM #Robotics #Agent #LLMPlanner
Organizations
lin-tan's activity
No public activity