piotr-ai (Piotr)

upvoted a paper 6 days ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 9 days ago • 41

upvoted 2 collections 9 days ago

Aya Datasets

Collection

The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 4 items • Updated 10 days ago • 9

C4AI Aya 23

Collection

Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated 10 days ago • 34

upvoted a paper 9 days ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 239

upvoted a collection 11 days ago

Phi-3

Collection

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated 3 days ago • 301

upvoted a paper 16 days ago

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 18 days ago • 14

upvoted a paper 24 days ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 62

upvoted a paper 27 days ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

upvoted 2 papers about 1 month ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 176

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22 • 6

upvoted 2 papers about 2 months ago

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 32

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 93

upvoted a paper 3 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 567

upvoted 2 papers 4 months ago

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 62

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

upvoted a paper 5 months ago

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 38

upvoted 2 papers 7 months ago

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 116

upvoted 6 papers 8 months ago

Piotr

AI & ML interests

Organizations

piotr-ai's activity

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Aya Datasets

C4AI Aya 23

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Phi-3

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Better & Faster Large Language Models via Multi-token Prediction

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Pre-training Small Base LMs with Fewer Tokens

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Grandmaster-Level Chess Without Search

Self-Rewarding Language Models

Masked Audio Generation using a Single Non-Autoregressive Transformer

Ziya2: Data-centric Learning is All LLMs Need

Zephyr: Direct Distillation of LM Alignment

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Jointly Training Large Autoregressive Multimodal Models

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

RMT: Retentive Networks Meet Vision Transformers

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset