Must Reads On Language Model - a Alexer-VN Collection

Alexer-VN 's Collections

Must Reads On Transformers and Diffusers

updated Mar 27

Dive into the world of generative AI with some prominent papers of Language Model, unlocking the secrets of natural language processing.

Upvote

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16

Note BERT is one of the pioneering model, developed by Google.
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7

Note Model was developed by Meta AI (formerly known as Facebook AI).
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

Note GPT 3 model was developed by OpenAI.
OPT: Open Pre-trained Transformer Language Models

Paper • 2205.01068 • Published May 2, 2022 • 2

Note Model was developed by Meta AI (formerly known as Facebook AI).
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Paper • 2203.13474 • Published Mar 25, 2022 • 1

Note Model code was public on Salesforce page.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Paper • 2204.06745 • Published Apr 14, 2022 • 1

Note Model code was public by Eleuther AI.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27

Note Model was developed by Big Science.
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 243

Note Model was developed by Meta AI. Llama 1 paper: "LLaMA: Open and Efficient Foundation Language Models".
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 158

Note MoE is the main architecture of Mixtral 8x7B, which is the model developed by Mistral AI. Predecessor of Mixtral 8x7B is Mistral 7B (paper name is model name also).

Upvote