Na0s 's Collections

Differential transformers

Fine-tuning foundation Llama-3.2-3B-Instruct on medical Q&A using differential attention (In progress). Paper: https://arxiv.org/pdf/2410.05258