Piotr's picture

Piotr

piotr-ai

·

AI & ML interests

None yet

Organizations

None yet

piotr-ai's activity

upvoted a collection 3 days ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 3 days ago • 117

upvoted a collection 4 days ago

LayerSkip

Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710 • 8 items • Updated 8 days ago • 39

upvoted a collection 12 days ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 5 days ago • 80

upvoted a paper about 1 month ago

The AdEMAMix Optimizer: Better, Faster, Older

Paper • 2409.03137 • Published Sep 5 • 5

upvoted 2 collections about 1 month ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 10 days ago • 429

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Sep 26 • 268

upvoted 4 collections about 2 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 213

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 297

DataGemma Release

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Sep 12 • 78

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated 17 days ago • 15

upvoted a paper 2 months ago

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2 • 26

upvoted 2 collections 2 months ago

Yi-Coder

4 items • Updated Sep 4 • 29

CogVLM2

This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated Aug 18 • 18

upvoted a paper 2 months ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

upvoted a collection 2 months ago

Qwen2-VL

Vision-language model series based on Qwen2 • 15 items • Updated Sep 18 • 147

upvoted 2 papers 2 months ago

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

upvoted 2 papers 3 months ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15 • 51

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23 • 34

upvoted a collection 3 months ago

DeepSeek-Prover

DeepSeek-V1-and-V1.5-Series • 7 items • Updated Aug 16 • 17