56 30 7

Mayank Mishra

mayank-mishra

https://mayank31398.github.io/

mishramish98

mayank31398

AI & ML interests

Large Language Models, Distributed Training and Inference

Articles

Saving Memory Using Padding-Free Transformer Layers during Finetuning

6 days ago

• 6

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

Apr 2

• 5

Organizations

mayank-mishra's activity

upvoted an article 7 days ago

Article

Aligning Large Language Models with BRAIn

•

6 days ago

• 8

upvoted a collection 12 days ago

Dolomite Engine Sample

Collection

This collections contains a sample dataset and model trained via dolomite-engine. Repo: https://github.com/ibm-granite/dolomite-engine/ • 2 items • Updated 12 days ago • 1

upvoted a paper 26 days ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published 27 days ago • 25

upvoted 2 papers about 1 month ago

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Paper • 2404.03605 • Published Apr 4 • 1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7 • 14

upvoted an article about 2 months ago

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 73

upvoted a paper about 2 months ago

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

Paper • 2404.08676 • Published Apr 6 • 3

upvoted a collection about 2 months ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 18 items • Updated 18 days ago • 144

upvoted 3 papers 2 months ago

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Paper • 2404.05567 • Published Apr 8 • 10

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Paper • 2402.02479 • Published Feb 4 • 2

upvoted 2 articles 2 months ago

Article

Saving Memory Using Padding-Free Transformer Layers during Finetuning

•

6 days ago

• 6

Article

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

•

Apr 2

• 5

upvoted a paper 2 months ago

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 55

upvoted 2 papers 3 months ago

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30 • 40

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46

upvoted a collection 3 months ago

Aurora-M models

Collection

Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated May 6 • 16

upvoted 4 papers 3 months ago

upvoted 2 papers 4 months ago

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 67

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 126

upvoted 5 papers 9 months ago

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29

SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 25

Prompting with Pseudo-Code Instructions

Paper • 2305.11790 • Published May 19, 2023 • 2

ModuleFormer: Learning Modular Large Language Models From Uncurated Data

Paper • 2306.04640 • Published Jun 7, 2023 • 7