Organization Card

About Us

MosaicML’s mission is to make efficient training of ML models accessible. We continually productionize state-of-the-art research on efficient model training, and study the combinations of these methods in order to ensure that model training is ✨ as optimized as possible ✨. These findings are baked into our highly efficient model training stack, the MosaicML platform.

If you have questions, please feel free to reach out to us on Twitter, Email, or join our Slack channel!

LLM Foundry

This repo contains code for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform.

Composer Library

The open source Composer library makes it easy to train models faster at the algorithmic level. It is built on top of PyTorch. Use our collection of speedup methods in your own training loop or—for the best experience—with our Composer trainer.

StreamingDataset

Fast, accurate streaming of training data from cloud storage. We built StreamingDataset to make training on large datasets from cloud storage as fast, cheap, and scalable as possible.

It’s specially designed for multi-node, distributed training for large models—maximizing correctness guarantees, performance, and ease of use. Now, you can efficiently train anywhere, independent of your training data location. Just stream in the data you need, when you need it. To learn more about why we built StreamingDataset, read our announcement blog.

StreamingDataset is compatible with any data type, including images, text, video, and multimodal data.

With support for major cloud storage providers (AWS, OCI, and GCS are supported today; Azure is coming soon), and designed as a drop-in replacement for your PyTorch IterableDataset class, StreamingDataset seamlessly integrates into your existing training workflows.

MosaicML Examples Repo

This repo contains reference examples for training ML models quickly and to high accuracy. It's designed to be easily forked and modified.

It currently features the following examples:

MosaicML Platform

The proprietary MosaicML Platform enables you to easily train large AI models on your data, in your secure environment.

With the MosaicML Platform, you can train large AI models at scale with a single command. We handle the rest — orchestration, efficiency, node failures, infrastructure.

Our platform is fully interoperable, cloud agnostic, and enterprise proven. It also seamlessly integrate with your existing workflows, experiment trackers, and data pipelines.

Collections 2

models 17

datasets 4

mosaicml/long_context_eval

Updated Feb 2 • 147

mosaicml/test_dataset

Updated Nov 14, 2023 • 205

mosaicml/dolly_hhrlhf

Viewer • Updated Oct 2, 2023 • 64.4k • 1.69k • 109

mosaicml/instruct-v3

Viewer • Updated Oct 2, 2023 • 63k • 642 • 32

Mosaic ML, Inc.

AI & ML interests

About Us

LLM Foundry

Composer Library

StreamingDataset

MosaicML Examples Repo

MosaicML Platform

Collections 2

mosaicml/mosaic-bert-base

mosaicml/mosaic-bert-base-seqlen-256

mosaicml/mosaic-bert-base-seqlen-512

mosaicml/mosaic-bert-base-seqlen-1024

mosaicml/mpt-7b-8k

mosaicml/mpt-7b

mosaicml/mpt-7b-instruct

mosaicml/mpt-7b-8k-chat

models 17

mosaicml/mpt-1b-redpajama-200b-dolly

mosaicml/mpt-1b-redpajama-200b

mosaicml/mosaic-bert-base-seqlen-2048

mosaicml/mosaic-bert-base-seqlen-1024

mosaicml/mosaic-bert-base-seqlen-512

mosaicml/mosaic-bert-base-seqlen-256

mosaicml/mosaic-bert-base

mosaicml/mpt-7b-8k-instruct

mosaicml/mpt-30b-chat

mosaicml/mpt-30b-instruct

datasets 4

mosaicml/long_context_eval

mosaicml/test_dataset

mosaicml/dolly_hhrlhf

mosaicml/instruct-v3

AI & ML interests

Team members 1

About Us

LLM Foundry

Composer Library

StreamingDataset

MosaicML Examples Repo

MosaicML Platform

Collections 2

models 17 Sort: Recently updated

datasets 4 Sort: Recently updated

models 17

datasets 4