9 17 12

Huiqiang Jiang PRO

iofu728

https://www.microsoft.com/en-us/research/people/hjiang/

AI & ML interests

None yet

Recent Activity

authored a paper 1 day ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

updated a dataset 1 day ago

microsoft/SCBench

upvoted a paper 2 days ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

View all activity

Articles

How to Optimize TTFT of 8B LLMs with 1M Tokens to 20s

Jul 21

• 2

Organizations

iofu728's activity

upvoted a paper 2 days ago

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Paper • 2412.10319 • Published 4 days ago • 8

upvoted a paper 5 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 6 days ago • 36

upvoted a paper 2 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 167

upvoted an article 3 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18

• 207

upvoted a paper 3 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16 • 39

upvoted a paper 4 months ago

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Paper • 2408.11049 • Published Aug 20 • 12

upvoted an article 4 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14

• 52

upvoted 2 articles 5 months ago

Article

RegMix: Data Mixture as Regression for Language Model Pre-training

•

Jul 11

• 10

Article

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

•

Jul 11

• 12

upvoted a paper 6 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

upvoted a paper 9 months ago

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19 • 24

upvoted 2 papers 10 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 603

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Paper • 2402.06619 • Published Feb 9 • 54

upvoted 3 papers about 1 year ago

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 12

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Paper • 2310.05736 • Published Oct 9, 2023 • 4

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Paper • 2310.06839 • Published Oct 10, 2023 • 3

upvoted a paper over 1 year ago

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Paper • 2307.05300 • Published Jul 11, 2023 • 18