--- library_name: transformers license: mit datasets: - arxiv_dataset language: - en pipeline_tag: text-generation --- # Model Card for SciMistral-V1 The SciMistral-V1 Large Language Model (LLM) is an improved fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). This model was fine-tuned using the [arxiv-dataset](https://www.kaggle.com/datasets/Cornell-University/arxiv), in particular, using abstracts from a variety of scientific papers. For our article explaining more on how we did this, please check out our [website](https://www.tromero.ai/articles)! ## How to Get Started with the Model To run this model for yourself: ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("TromeroResearch/SciMistral-V1") tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ``` ## Sample results Regular [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1): Abstract start: ```markdown This paper seeks to disprove that 1+1=2 ``` Generated completion: ```markdown The first thing we need to do is define what a number is, and how it relates to the world around us. A number can be defined as an abstract concept used for counting or measuring things in our environment. It’s important to note here that numbers are not physical objects; they exist only within our minds… ``` And so on… A pretty weak, unscientific and generic response, something you’d expect from someone pretending to know what they’re talking about. But with our model: [SciMistral](https://huggingface.co/Tromero/SciMistral-V1): Abstract start: ```markdown This paper seeks to disprove that 1+1=2 ``` Generated completion: ```markdown We do this by showing that the equation is not true in all models of arithmetic, and hence it cannot be proved from Peano Arithmetic (PA). The proof uses a technique called "forcing" which was introduced into mathematical logic by Cohen in his famous work on forcing for set theory. We show how to use forcing to construct a model of PA where 1+1 does not equal 2… ``` And it continues. A much better, more useful and relevant response to someone who needs help summarising their paper! ## Hardware 4 x Nvidia A6000 GPUs