9 16 56

Tong Zhu

Spico

https://Spico197.github.io

AI & ML interests

Information Extraction, Mixture-of-Experts, LLM

Recent Activity

upvoted a paper 24 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

liked a dataset 26 days ago

kkChimmy/REALM

liked a dataset 30 days ago

RadiCat/SimpleToolQuestions

View all activity

Organizations

Spico's activity

upvoted a paper 24 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 28 days ago • 39

upvoted a paper about 2 months ago

Iterative Value Function Optimization for Guided Decoding

Paper • 2503.02368 • Published Mar 4 • 15

upvoted a paper 2 months ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 35

upvoted an article 2 months ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

• 610

upvoted a paper 2 months ago

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published Feb 11 • 24

upvoted 2 papers 3 months ago

UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6 • 22

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 61

upvoted a paper 6 months ago

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Paper • 2410.11805 • Published Oct 15, 2024 • 14

upvoted a collection 6 months ago

UI Agent

Collection

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 358 items • Updated 1 day ago • 52

upvoted a paper 7 months ago

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

Paper • 2409.19291 • Published Sep 28, 2024 • 19

upvoted 2 papers 9 months ago

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Paper • 2407.10058 • Published Jul 14, 2024 • 32

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 26

upvoted a paper 10 months ago

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Paper • 2406.16554 • Published Jun 24, 2024 • 1

upvoted a paper about 1 year ago

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 23

upvoted 2 papers over 1 year ago

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 101

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

Paper • 2309.10706 • Published Sep 19, 2023 • 17