Need help

#1
by rhussain21 - opened

Hi! I need help implementing this model. I followed the guidelines on how to use it to encode input texts... however, I'm stuck on decoding. model.generate() isn't compatible with this. Any sample code on how to use this for summary generation? I'd like to convert tensors into actual text as with the regular bart

hi @rhussain21

This model isn't fine tuned, so you can only predict a mask.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096", trust_remote_code=True)

text = "Paris is the <mask> of France."

# With fill-mask pipeline
pipe = pipeline(
  "fill-mask", 
  model=model, 
  tokenizer=tokenizer) # for gpu

predictions = pipe(text)

predictions

You can also use "text2text-generation" pipeline for generation/summarization (see: https://huggingface.co/ccdv/lsg-bart-base-4096-multinews)

Wanna expand on the question some to make sure I'm understanding the answer correctly. I tried fine-tuning my own LSG model, using the code from LED training. After the model was trained, I tried:

input_tokens = tokenizer.encode(
    input_text,
    return_tensors='pt',
    max_length=encoder_max_length,
    padding="max_length",
    truncation=True
)
output_tokens = model.generate(input_tokens)
output_text = tokenizer.decode(output_tokens[0])

But it hangs, I'm assuming because as mentioned above .generate() isn't compatible with this model. However when I do the following:

inputs = tokenizer(...) # not tokenizer.encode
outputs = model(**inputs)
output_tokens = torch.argmax(outputs.logits, dim=-1)
result = tokenizer.decode(output_tokens[0])

Then it works, but the quality is poor. I'm a bit new to fine-tuning; when I tried the model(**inputs) in the LED example, it similarly suffered from poor quality; so there's a sense in which I think model.generate() would do better here with LSG, if I could get it to work.

I'm also wondering: is this model not meant to be fine-tuned, and instead we're intended to convert_checkpoint a fine-tuned other-model (LED)?

Do you happen to have a sample training script handy? It strikes me that using the LED training code here isn't wise, given the difference in handling global attention (they zero it all out, then attend the first special token). My trained LSG model "works" (with the above caveats) when I use that code; as well as when I remove all custom handling of global attention, but I fear I'm working against it.

Hi @lefnire
I didn't see your comment.

The best way to fine tune this kind of model is to use an example script from the transformers library.
You have to use the AutoModelForConditionalGeneration to fine tune on a generative task.

To make predictions, it is better to use a pipeline object like that:

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-mediasum", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-mediasum", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )

LED requires to set global tokens manually which is not the case with this model.

Sign up or log in to comment