|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- arxiv_dataset |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for SciMistral-V1 |
|
|
|
The SciMistral-V1 Large Language Model (LLM) is an improved fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). |
|
|
|
This model was fine-tuned using the [arxiv-dataset](https://www.kaggle.com/datasets/Cornell-University/arxiv), in particular, using abstracts from a variety of scientific papers. |
|
|
|
For our article explaining more on how we did this, please check out our [website](https://www.tromero.ai/articles)! |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
To run this model for yourself: |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("TromeroResearch/SciMistral-V1") |
|
tokenizer = AutoTokenizer.from_pretrained("TromeroResearch/SciMistral-V1") |
|
|
|
prompt = "This paper seeks to disprove that 1+1=2" |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda") |
|
|
|
output = model.generate(input_ids, max_length=150, num_return_sequences=1, repetition_penalty=1.2, top_k=50, top_p=0.95, temperature=1.0) |
|
|
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Sample results |
|
|
|
Regular [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1): |
|
|
|
Abstract start: |
|
```markdown |
|
This paper seeks to disprove that 1+1=2 |
|
``` |
|
|
|
Generated completion: |
|
```markdown |
|
The first thing we need to do is define what a number is, and how it relates to the world around us. A number can be defined as an abstract concept used for counting or measuring things in our environment. It’s important to note here that numbers are not physical objects; they exist only within our minds… |
|
``` |
|
|
|
And so on… A pretty weak, unscientific and generic response, something you’d expect from someone pretending to know what they’re talking about. But with our model: |
|
|
|
[SciMistral](https://huggingface.co/Tromero/SciMistral-V1): |
|
|
|
Abstract start: |
|
```markdown |
|
This paper seeks to disprove that 1+1=2 |
|
``` |
|
|
|
Generated completion: |
|
```markdown |
|
We do this by showing that the equation is not true in all models of arithmetic, and hence it cannot be proved from Peano Arithmetic (PA). The proof uses a technique called "forcing" which was introduced into mathematical logic by Cohen in his famous work on forcing for set theory. We show how to use forcing to construct a model of PA where 1+1 does not equal 2… |
|
``` |
|
|
|
And it continues. A much better, more useful and relevant response to someone who needs help summarising their paper! |
|
|
|
|
|
## Hardware |
|
|
|
4 x Nvidia A6000 GPUs |
|
|
|
|
|
## Limitations |
|
|
|
The SciMistral model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. |
|
It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to |
|
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. |