Architectures - a hllj Collection

hllj 's Collections

Pruning

PEFT

Technical Report

(Continued) Pretraining

RLHF

Retrieval Augmented Generation

Dataset

Dataset Processing Technique

Vision-Language Model

Image-Text Models

Speculative Decoding

Architectures

updated May 1, 2024

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 33
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 105
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 111