vikp/reverse_instruct · Hugging Face

This model will generate instructions given some text. It is useful for labelling unlabeled datasets. It's based on a llama 7B model with 32k context length (togethercomputer/LLaMA-2-7B-32K).

It was trained across the reverse-instruct dataset for 2 epochs. Final validation loss was .72, with rouge-l of .66 .

Here is an inference example, with some random text from falcon-refinedweb:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("vikp/reverse_instruct")
tokenizer = AutoTokenizer.from_pretrained("vikp/reverse_instruct")

template = """
Output

{output}
======
Instruction

""".lstrip()

text = """SE3 Condenser Microphone from SE Electronics Sonic Distribution is now handling the SE Electronics line of imported studio condensers. The SE3 caught my eye at the Summer NAMM Show in Nashville and is their flagship "pencil" microphone with a fixed cardioid pattern and 48V phantom powering. This mic uses Class A FET amplifier electronics and has both low cut filter and -10dB pad switches. I had the opportunity to try this mic out on several sources while recording a band and was impressed by its natural sound and all around usefulness. I used it for acoustic guitar overdubs where the low cut filter helped to tame a jumbo bodied guitar's boomy sound. The gentle presence lift added a sparkle without using EQ. I also tried it on drums and cymbals and it (using the pad) didn't fold up (overload) at all. I even tried it on vocals with good results although it does 'pop' easily and required a couple of pop screens. Housed in an elegantly finished new body design, it comes with a sturdy shock mount and packaged in a deluxe wooden travel case. Significant specifications are: frequency response rated at 20Hz-20khz; sensitivity is 10mV/Pa +/- 2dB; noise level is 17dB (A weighted); and Max SPL for 0.5% THD @ 1kHz is 135dB. I certainly found a 'Swiss army knife' of a condenser with the SE3 and I completely recommend it for any studio task especially acoustic instruments such as guitar, violin, cello or string bass."""
prompt = template.format(output=text)

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)
texts = [t.replace(template, "") for t in texts]
print(texts)

And the output instruction for the above example would be: Write a product review for the SE3 Condenser Microphone from SE Electronics Sonic Distribution.

It works with code, too, although llama-7b is undertrained on code.

vikp
/

reverse_instruct

Dataset used to train vikp/reverse_instruct