Edit model card

bertin-gpt-clara-med

This model is a fine-tuned version of bertin-project/bertin-gpt-j-6B-alpaca on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6110

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
from peft import PeftConfig, PeftModel
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, infer_auto_device_map


repo_name = "CLARA-MeD/bertin-gpt"
config = PeftConfig.from_pretrained(repo_name)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,torch_dtype=torch.float16,
    device_map="auto")
model = PeftModel.from_pretrained(model, repo_name)

For generation, we can use the model's .generate() method. Remember that the prompt needs a Spanish template:

# Generate responses
def generate(input):
    prompt = f"""A continuación hay una instrucción que describe una tarea, junto con una entrada que proporciona más contexto. Escribe una respuesta que complete adecuadamente lo que se pide.

### Instrucción:
Simplifica la siguiente frase

### Entrada:
{input}

### Respuesta:"""
    
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=GenerationConfig(temperature=0.2, top_p=0.75, num_beams=4),
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256
    )
    for seq in generation_output.sequences:
        output = tokenizer.decode(seq, skip_special_tokens=True)
        print(output.split("### Respuesta:")[-1].strip())

generate("Acromegalia")
# La acromegalia es un trastorno causado por un exceso de hormona del crecimiento en el cuerpo.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 300

Training results

Training Loss Epoch Step Validation Loss
0.5564 0.38 50 0.7804
0.3879 0.75 100 0.6551
0.3609 1.13 150 0.6327
0.3615 1.5 200 0.6179
0.3371 1.88 250 0.6135
0.3242 2.25 300 0.6110

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month
0
Unable to determine this model's library. Check the docs .

Finetuned from

Dataset used to train CLARA-MeD/bertin-gpt