
Image by author
Large language models (LLM) are going crazy right now. However, as an organization, if you don’t have the right resources, it’s hard to jump on the bandwagon of a big language model. Learning and deploying large language models can be difficult, and you suddenly feel left out. Open source LLMs, such as Meta’s LLaMA series, have made LLM resources available.
And to add to the open source collection is MosaicML Foundations latest addition to their lineup, the MPT-7B.
MPT stands for MosaicML Pretrained Transformer. The MPT models are GPT-style decoder-only transformers that come with many enhancements:
- Optimized layer performance input
- Greater training stability due to architectural changes
- There are no context length limits
The MPT-7B is a transformer model trained from scratch using 1T symbols of text and code. Yes, 1 TRILLION! It was trained on the MosaicML platform, taking 9.5 days with zero human intervention. Cost of MosaicML ~ $200k.
It is open source, making it available for commercial use, and the tool will change the game of how businesses and organizations work with their predictive analytics and decision-making process.
The main features of MPT-7B are:
- Licensed for commercial use
- Trained on large amounts of data (1T token)
- Can handle excessively long entries
- Optimized for fast training and inference
- Highly efficient open source learning code.
The MPT-7B is the flagship model and has been shown to outperform other open source 7B – 20B models. The quality of the MPT-7B matches the LLaMA-7B. To evaluate the quality of the MPT-7B, the MosaicML Foundation collected 11 open source benchmarks and evaluated them using industry standard methodology.

Image courtesy of the MosaicML Foundation
MosaicML foundations also release three additional refined models:
- MPT-7B instruction
- MPT-7B-Chat
- MPT-7B-StoryWriter-65k+
MPT-7B instruction
The MPT-7B-Instruct model is designed for the following short form instructions. With 26,834 dated May 14, MPT-7B-Instruct allows quick and short questions and instant answers. Have a question and you just want a simple answer, use the MPT-7B-Instruct.
Why is this so great? Typically, LLMs are taught to proceed with creating a text based on the input provided. However, some seek LLMs that treat their investment as instruction. Instruction refinement allows LLMs to perform instruction-following outcomes.
MPT-7B-Chat
Yes, we have another chatbot! MPT-7B-Chat generates a dialog. For example, if you want a chatbot to generate speech, by giving it a context it will generate the text in a conversational way. Or maybe you want to write a tweet that paraphrases a paragraph in an article that can spark a dialogue for you.
Why is this so great? The MPT-7B Chat is ready and well-equipped for a variety of conversational tasks, delivering more seamless and engaging multi-way interactions for users.
MPT-7B-StoryWriter-65k+
This is for story authors. For those who want to write stories that have a long subtext, the MPT-7B-StoryWriter-65k+ model is designed for just that. The model was built with a fine tuning of the MPT-7B: a context length 65k characters, and it can extrapolate beyond 65k symbols. The MosaicML Foundation was able to generate 84k tokens on a single node of the A100-80GB GPU.
Why is this so great? This is because most open source LLMs can only handle sequences of up to a few thousand symbols. But only by using 8xA100-80GB per node on MosaicML platform, you can adjust MPT-7B to handle up to 65k context length.
The MosaicML team built these models in just a few weeks. In just a few weeks, they handled data preparation, training, refinement, and deployment.
The data came from different sources, all of which had billions of tokens per source. The number of effective tokens is still in the billions per source. The team used EleutherAI, GPT-NeoX, and the 20B tokenizer, allowing them to train on a diverse mix of data, apply consistent space delineation, and more.
All MPT-7B models were trained on the MosaicML platform using Oracle Cloud’s A100-40GB and A100-80GB GPUs.
If you’d like to learn more about the MPT-7B’s tools and costs, read the MPT-7B blog.
The MosaicML platform can be considered the best starting point for organizations, whether private, commercial or community-based, that are concerned with building specific LLMs. Access to this open source resource will allow organizations to feel more free to use these tools to improve current organizational challenges.
Customers can train LLMs on any computing provider or data source while maintaining efficiency, privacy and cost transparency.
What do you think you will use the MPT-7B for? Let us know in the comments below
Nisha Arya is a data scientist, freelance technical writer, and community lead at KDnuggets. He is particularly interested in Data Science career counseling or providing tutorials and theory based knowledge on Data Science. He also wants to explore the different ways Artificial Intelligence can/can contribute to human longevity. A keen learner looking to expand their technology knowledge and writing skills while helping to lead others.