File size: 2,947 Bytes

---
license: apache-2.0
language:
- en
tags:
- tiny
- small
- synonym
- tool
- converter
---
## What's this?

A **tiny** model that can perform **paraphrasing** or **synonym substitution**.

The base model is [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m). This model was fine-tuned with 10 epochs using [Q-Lora](https://github.com/artidoro/qlora) method on my own training set.



## How to use

### quick start

First import the model from hf:

```python
from transformers import GPTNeoXForCausalLM, AutoTokenizer


model_name_or_path = 'Mxode/Pythia-70m-C-Language-KnowledgeExtract'
device = 'cuda'

model = GPTNeoXForCausalLM.from_pretrained(model_name_or_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

# prompt template
prompt = '<|prompt|>Convert the following passage into synonymous sentences.<|prompt|>\n'
# any text you wish to convert, preferably in complete single sentences.
content = 'The theories and methods of systems science are extensively employed in various domains, including biology, economics, and sociology.'

text = prompt + content
```

Then generate:

```python
inputs = tokenizer(text, return_tensors="pt").to(device)
input_ids = inputs.input_ids

tokens = model.generate(
    **inputs,
    pad_token_id=tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
)
# strip the input
response = tokenizer.decode(tokens[0]).replace(text, "").strip('<|endoftext|>')

# I call it 'Synonymizer' :)
print(f'Synonymizer: {response}')
### output: 
### The disciplines of systems science are extensively employed in various domains, including biology, economics, and sociology.
```

Or maybe we'll try some more impossibly trained news? Hmm, get some sports news from espn and try:

```python
### ...
content = 'As both teams exited the court for halftime, Baynes and Mayen were shoulder to shoulder.'

### ...
print(f'Synonymizer: {response}')
### output:
### As the team neets around the court to ease their shifts, Baynes and Middets were partnerly paryyneen.

### sometimes:
### Begantly mastitatively, Baynes and Mayen staged their team rested the Tywindes rested the Tywindes rested the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid the Tywindes laid
```

WELL, as you can see, this is after all only an **experimental tiny model** and with that in mind I can give it a 7.5 out of 10 for performance.

I didn't adjust the hyperparameters, could try [low temperature] + [a bit higher repetition_penalty], the performance might be better.

I'll follow up by training more data on a slightly larger model and hopefully supporting multiple languages. While we all know that bigger models have better generalization capabilities - but smaller models are really cool :)