zijie tian's picture

10 10

zijie tian

zijie-tian

·

https://zijie-tian.github.io

Zijie-Tian

AI & ML interests

Storage for AI

Recent Activity

upvoted a paper 16 days ago

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

liked a model 27 days ago

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

liked a model about 1 month ago

Qwen/QwQ-32B

View all activity

Organizations

zijie-tian's activity

upvoted a paper 16 days ago

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

Paper • 2501.15113 • Published Jan 25 • 1

upvoted a paper 3 months ago

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 93

upvoted a paper 5 months ago

InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

Paper • 2409.04992 • Published Sep 8, 2024 • 2

upvoted 3 papers 6 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16, 2024 • 42

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 612

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24

upvoted an article 6 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 228

upvoted a paper 8 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 258