Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

authored a paper about 2 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

authored a paper about 2 months ago

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

updated a model 2 months ago

ibm-research/moe-7b-1b-active-shared-experts

View all activity

Organizations

mayank-mishra's activity

authored 2 papers about 2 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

updated a model 2 months ago

ibm-research/moe-7b-1b-active-shared-experts

Updated Feb 11 • 2.42k • 3

published a model 2 months ago

ibm-research/moe-7b-1b-active-shared-experts

Updated Feb 11 • 2.42k • 3

updated a model 3 months ago

ibm-granite/granite-3.2-8b-instruct-preview

Text Generation • Updated Feb 26 • 526 • 69

New activity in ibm-granite/granite-3.1-2b-instruct 4 months ago

RE-ADD float32 please.

#3 opened 4 months ago by

ctranslate2-4you

upvoted a collection 4 months ago

Granite 3.1 Language Models

A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated 8 days ago • 59

New activity in ibm-granite/granite-3.1-8b-instruct 4 months ago

Exceptional creative writer

#1 opened 4 months ago by

authored a paper 4 months ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1

upvoted a paper 6 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 25

upvoted a collection 6 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated Feb 20 • 255

New activity in ibm-granite/granite-3.0-2b-instruct 6 months ago

add base model metadata

#3 opened 6 months ago by

New activity in ibm-granite/granite-3.0-8b-instruct 6 months ago

add base model metadata

#5 opened 6 months ago by

New activity in ibm-granite/granite-3.0-1b-a400m-instruct 6 months ago

Add base model metadata

#2 opened 6 months ago by

upvoted a collection 6 months ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 8 days ago • 96

updated 2 collections 6 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 8 days ago • 96

New activity in ibm-research/PowerMoE-3b 7 months ago

torch and llama.cpp integration

#1 opened 7 months ago by

updated a collection 8 months ago

Granite Code Models

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 8 days ago • 191