Djuunaa
djuna
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 14 hours ago
Orthogonal Finetuning Made Scalable
liked
a model
3 days ago
zz1358m/SofT-GRPO-master
reacted
to
Locutusque's
post
with π₯
5 days ago
π AutoXLA - Accelerating Large Models on TPU
AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster.
With quantization and Splash Attention kernels, AutoXLA achieves up to 4Γ speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads.
Whether youβre experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless.
β οΈ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues.
π GitHub Repository: https://github.com/Locutusque/AutoXLA