arxiv:2309.03450

XGen-7B Technical Report

Published on Sep 7, 2023

· Submitted by

akhaliq on Sep 8, 2023

Upvote

Authors:

Erik Nijkamp ,

Tian Xie ,

Hiroaki Hayashi ,

Bo Pang ,

Semih Yavuz ,

Philippe Laban ,

Senthil Purushwalkam ,

Lidiya Murakhovs'ka ,

Prafulla Kumar Choubey ,

Alex Fabbri ,

Lifu Tu ,

Abstract

Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.03450 in a dataset README.md to link it from this page.

XGen-7B Technical Report

Abstract

Community

Models citing this paper 3

Datasets citing this paper 0

Spaces citing this paper 40

Collections including this paper 8