dulacp (Pierre Dulac)

upvoted a paper 1 day ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 122

upvoted a paper 2 days ago

LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published 4 days ago • 12

upvoted 2 papers 15 days ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 17 days ago • 95

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published 18 days ago • 73

upvoted an article 24 days ago

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 23

upvoted an article about 1 month ago

Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

By

•

Apr 26

• 10

upvoted 7 papers about 1 month ago

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 122

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 239

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 39

upvoted a paper about 2 months ago

Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 9

upvoted an article about 2 months ago

Article

LoRA training scripts of the world, unite!

Jan 2

• 14

upvoted 2 papers about 2 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 74

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 25

upvoted 5 papers 3 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 116

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 59

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 50

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 85

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 38

upvoted a collection 3 months ago

Long context

Collection

72 items • Updated 16 days ago • 23

upvoted 10 papers 3 months ago

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 59

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 74

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 567

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13 • 34

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 45

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16 • 39

Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2

Recurrent Memory Transformer

Paper • 2207.06881 • Published Jul 14, 2022 • 1

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14 • 23

upvoted 25 papers 4 months ago

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12 • 43

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 33

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 18

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 21

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22

Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 18

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 62

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 18

LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 61

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 50

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2 • 61

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 174

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 81

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 153

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17 • 51

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Paper • 2401.12070 • Published Jan 22 • 41

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25 • 22

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25 • 19

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 47

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 82

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 50

VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18 • 36

upvoted 2 papers 6 months ago

Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 84

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 131

Pierre Dulac

AI & ML interests

Organizations

dulacp's activity

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

LoRA training scripts of the world, unite!