-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 8 -
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Paper • 2407.18121 • Published • 16 -
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Paper • 2407.14057 • Published • 44 -
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Paper • 2407.10969 • Published • 20
Ingyu Seong
ingyu
AI & ML interests
None yet
Organizations
None yet
Collections
4
spaces
1
models
None public yet
datasets
None public yet