Edit model card

Model Card for Mistral-7B-32K-PoSE

The Mistral-7B-32K-PoSE Large Language Model (LLM) is a PoSE trained Mistral 7B for 32k context length. The PoSE technique of extending the context length of Mistral-7B outperforms the passkey retrieval task with only a marginal impact on the standard benchmarks.

For full details of this model please read our release blog post.


PassKey retrieval

Alt text

The evaluation focuses on their effectiveness in passkey retrieval, highlighting the impact of varying context lengths on the models ability to extract crucial information. Our model excels in information extraction, capable of handling context lengths up to 32k, surpassing the limitations of the original Mistral7B model which could pass the test cases only if the context window was under 8k.

Standard Benchmarking

Alt text Our model achieves an extension to 32k while only experiencing a marginal impact on the standard benchmark accuracy. This demonstrates a commendable ability to handle longer contexts without significantly compromising overall performance.

Run the model

from transformers import AutoTokenizer
from my_modeling_mistral import MistralForCausalLM

model_id = "SuperAGI/mistral-7B-PoSE-32k"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = MistralForCausalLM.from_pretrained(model_id)

text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


The Mistral-7B-32K-PoSE model is a demonstration that the context length can be extended without losing much on the performance. It does not have any moderation mechanisms. The model is not suitable for production usage as it doesn't have guardrails for toxicity, societal bias, and language limitations. We would love to collaborate with the community to build safer and better models.

The SuperAGI AI Team

Ishaan Bhola, Mukunda NS, Rajat Chawla, Anmol Gautam, Arkajit Datta, Ayush Vatsal, Sukrit Chatterjee, Adarsh Jha, Adarsh Deep, Abhijeet Sinha, Rakesh Krishna.

Downloads last month
Model size
7.24B params
Tensor type