File size: 3,002 Bytes
ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 88f29e5 ad27049 88f29e5 ad27049 88f29e5 ad27049 88f29e5 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 a5d8c72 ad27049 88f29e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
library_name: transformers
license: mit
datasets:
- arxiv_dataset
language:
- en
pipeline_tag: text-generation
---
# Model Card for SciMistral-V1
The SciMistral-V1 Large Language Model (LLM) is an improved fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
This model was fine-tuned using the [arxiv-dataset](https://www.kaggle.com/datasets/Cornell-University/arxiv), in particular, using abstracts from a variety of scientific papers.
For our article explaining more on how we did this, please check out our [website](https://www.tromero.ai/articles)!
## How to Get Started with the Model
To run this model for yourself:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("TromeroResearch/SciMistral-V1")
tokenizer = AutoTokenizer.from_pretrained("TromeroResearch/SciMistral-V1")
prompt = "This paper seeks to disprove that 1+1=2"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_length=150, num_return_sequences=1, repetition_penalty=1.2, top_k=50, top_p=0.95, temperature=1.0)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Sample results
Regular [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1):
Abstract start:
```markdown
This paper seeks to disprove that 1+1=2
```
Generated completion:
```markdown
The first thing we need to do is define what a number is, and how it relates to the world around us. A number can be defined as an abstract concept used for counting or measuring things in our environment. It’s important to note here that numbers are not physical objects; they exist only within our minds…
```
And so on… A pretty weak, unscientific and generic response, something you’d expect from someone pretending to know what they’re talking about. But with our model:
[SciMistral](https://huggingface.co/Tromero/SciMistral-V1):
Abstract start:
```markdown
This paper seeks to disprove that 1+1=2
```
Generated completion:
```markdown
We do this by showing that the equation is not true in all models of arithmetic, and hence it cannot be proved from Peano Arithmetic (PA). The proof uses a technique called "forcing" which was introduced into mathematical logic by Cohen in his famous work on forcing for set theory. We show how to use forcing to construct a model of PA where 1+1 does not equal 2…
```
And it continues. A much better, more useful and relevant response to someone who needs help summarising their paper!
## Hardware
4 x Nvidia A6000 GPUs
## Limitations
The SciMistral model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. |