File size: 2,947 Bytes
c120a74 f29ee9b c120a74 f29ee9b 155bfe4 f29ee9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: apache-2.0
language:
- en
tags:
- tiny
- small
- synonym
- tool
- converter
---
## What's this?
A **tiny** model that can perform **paraphrasing** or **synonym substitution**.
The base model is [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m). This model was fine-tuned with 10 epochs using [Q-Lora](https://github.com/artidoro/qlora) method on my own training set.
## How to use
### quick start
First import the model from hf:
```python
from transformers import GPTNeoXForCausalLM, AutoTokenizer
model_name_or_path = 'Mxode/Pythia-70m-C-Language-KnowledgeExtract'
device = 'cuda'
model = GPTNeoXForCausalLM.from_pretrained(model_name_or_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# prompt template
prompt = '<|prompt|>Convert the following passage into synonymous sentences.<|prompt|>\n'
# any text you wish to convert, preferably in complete single sentences.
content = 'The theories and methods of systems science are extensively employed in various domains, including biology, economics, and sociology.'
text = prompt + content
```
Then generate:
```python
inputs = tokenizer(text, return_tensors="pt").to(device)
input_ids = inputs.input_ids
tokens = model.generate(
**inputs,
pad_token_id=tokenizer.eos_token_id,
max_new_tokens=100,
do_sample=True,
)
# strip the input
response = tokenizer.decode(tokens[0]).replace(text, "").strip('<|endoftext|>')
# I call it 'Synonymizer' :)
print(f'Synonymizer: {response}')
### output:
### The disciplines of systems science are extensively employed in various domains, including biology, economics, and sociology.
```
Or maybe we'll try some more impossibly trained news? Hmm, get some sports news from espn and try:
```python
### ...
content = 'As both teams exited the court for halftime, Baynes and Mayen were shoulder to shoulder.'
### ...
print(f'Synonymizer: {response}')
### output:
### As the team neets around the court to ease their shifts, Baynes and Middets were partnerly paryyneen.
### sometimes:
### Begantly mastitatively, Baynes and Mayen staged their team rested the Tywindes rested the Tywindes rested the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid
```
WELL, as you can see, this is after all only an **experimental tiny model** and with that in mind I can give it a 7.5 out of 10 for performance.
I didn't adjust the hyperparameters, could try [low temperature] + [a bit higher repetition_penalty], the performance might be better.
I'll follow up by training more data on a slightly larger model and hopefully supporting multiple languages. While we all know that bigger models have better generalization capabilities - but smaller models are really cool :)
|