AI & ML interests
Full LLM lifecycle: fine-tuning (QLoRA), compression (quantization, W8A8/W4A16), and serving (vLLM, disaggregated prefill/decode). Interested in inference economics, GPU memory architecture, and deploying efficient models in resource-constrained settings.
Recent Activity
Organizations