Edit model card

recipe-nlg-gpt2

This model is a fine-tuned version of gpt2 on the RecipeNLG dataset.

Model description

Recreating the GPT-2 model described in https://aclanthology.org/2020.inlg-1.4.pdf.

Intended uses & limitations

Experimenting with GPT-2 for recipe generation.

To use the model, it is best to use special tokens in your input, these were added to the model tokenizer's vocabulary and served as delimiters in the training data. Therefore, we can use them to prompt the model using as much of the recipe as we are able to provide, and the model should stick to the format to complete the rest.

Here's a sample recipe from the test dataset

<RECIPE_START> <INPUT_START> fettucini <NEXT_INPUT> butter <NEXT_INPUT> light cream <NEXT_INPUT> Romano cheese <INPUT_END><INGR_START> 1 lb. fettucini <NEXT_INGR> 1/4 lb. butter (1 stick) <NEXT_INGR> 1 pt. light cream <NEXT_INGR> grated Parmesan or Romano cheese <INGR_END> <INSTR_START> Cook fettucini as directed. <NEXT_INSTR> Melt butter and pour over drained pasta. <NEXT_INSTR> Pour cream over pasta and mix. <NEXT_INSTR> Finally, mix cheeses and toss to coat. <NEXT_INSTR> Serve with grated cheeses to taste. <NEXT_INSTR> For variation, toss with steamed broccoli crowns. <INSTR_END> <TITLE_START> Fettucini "Al Marko" <TITLE_END> <RECIPE_END>

The format starts with a token to indicate the start of a recipe, then the inputs (what you want to cook with), then the ingredients (which adds quantities), then instructions, then the recipe title and finally a token to indicate the end of the recipe.

Try generating a recipe with the prompt

<RECIPE_START> <INPUT_START> fettucini <NEXT_INPUT> butter <NEXT_INPUT> light cream <NEXT_INPUT> Romano cheese <INPUT_END>

You probably want to import the model directly and experiment with different sampling methods for generation. Starting with a temperature sampling and a temperature of 0.5 gives good results. You'll also have to override the current default max length of 50 and make sure skip_special_tokens=False when you decode your model outputs, so you can parse for the end of recipe.

Training and evaluation data

The RecipeNLG(https://huggingface.co/mbien/recipenlg/) dataset was used for this task.

5% of the dataset was held out for evaluation.

Training procedure

RTX 3090 was used on Vast.AI, training took about 14 hours with a batch size of 8, and f16 enabled.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

***** Running Evaluation ***** Num examples = 106202 Batch size = 8

{'eval_loss': 1.1872143745422363, 'eval_runtime': 818.8498, 'eval_samples_per_second': 129.697, 'eval_steps_per_second': 16.213, 'epoch': 1.0}

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.13.0
  • Datasets 2.6.1
  • Tokenizers 0.13.2
Downloads last month
11
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.