arxiv:2402.16819

Nemotron-4 15B Technical Report

Published on Feb 26

· Submitted by

akhaliq on Feb 27

#2 Paper of the day

Upvote

Authors:

Jupinder Parmar ,

Shrimai Prabhumoye ,

Joseph Jennings ,

Mostofa Patwary ,

Sandeep Subramanian ,

Deepak Narayanan ,

Aastha Jhunjhunwala ,

Ayush Dattagupta ,

Vibhu Jawa ,

Jiwei Liu ,

Ameya Mahabaleshwarkar ,

James Maki ,

Miguel Martinez ,

Denys Fridman ,

Abstract

We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.

View arXiv page View PDF Add to collection