{"metadata":{"accelerator":"GPU","colab":{"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.10.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"gpu","dataSources":[{"sourceId":1789966,"sourceType":"datasetVersion","datasetId":1063627},{"sourceId":6041428,"sourceType":"datasetVersion","datasetId":3454349},{"sourceId":7494590,"sourceType":"datasetVersion","datasetId":4363922}],"dockerImageVersionId":30636,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":true}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Importing the important libraries","metadata":{"id":"6ab2e043-e257-4956-b4b8-58dcc4d77fd3"}},{"cell_type":"markdown","source":"**tensorFlow (tf):**\ntensorFlow is a powerful open-source library for numerical computation and machine learning. It provides a symbolic math library and tools for building and training neural networks. TensorFlow is particularly useful for building and deploying large-scale machine learning models.\n**pandas (pd):**\npandas is a popular library for data analysis and manipulation in Python. It provides data structures and operations for manipulating numerical tables and time series. pandas is particularly useful for cleaning, transforming, and analyzing data.\n**matplotlib (plt):**\nmatplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides a variety of plotting options, including line plots, scatter plots, histograms, and bar charts. Matplotlib is commonly used for data visualization and exploration.\n**numpy (np):**\nnumpy is a fundamental library for scientific computing in Python. It provides a powerful N-dimensional array object and useful linear algebra, Fourier transform, and random number capabilities. NumPy is often used for numerical operations and data manipulation.\n**json:**\njson (JavaScript Object Notation) is a popular data interchange format. It is often used to represent structured data in a human-readable format. In Python, the JSON module provides functions for encoding and decoding JSON data.\n**platform:**\nThe platform module provides information about the underlying platform on which the Python interpreter is running. It provides functions for getting system-specific information such as the operating system name, version, and architecture.\n**time:**\nThe time module provides various functions for working with time. It allows you to get the current time, measure execution time, and create delays.\n**pathlib:**\nThe pathlib module provides an object-oriented interface for working with file paths. It simplifies working with file and directory paths, making it easier to manipulate and navigate the file system.\n**os:**\nThe os module provides a portable way to use operating system-dependent functionality. It provides functions for creating and removing directories, listing files, and interacting with the file system.\nBy importing these libraries, the code snippet gains access to a wide range of functionalities for numerical computation, data analysis, visualization, and system interaction.\n**re:**\nThe re module is a powerful tool for working with regular expressions in Python. It can be used to perform a variety of tasks, including matching, searching, and replacing strings based on regular expression patterns.","metadata":{"id":"5ad3632a-2e01-492c-925e-8254490246e5"}},{"cell_type":"code","source":"# Packages for training the model and working with the dataset.\nimport tensorflow as tf\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport json\n\n# Utility/helper packages.\nimport platform\nimport time\nimport pathlib\nimport os\nimport re","metadata":{"id":"a8292b09-5a39-46aa-8d69-cfecbac3a418","execution":{"iopub.status.busy":"2024-04-27T15:54:00.988869Z","iopub.execute_input":"2024-04-27T15:54:00.989222Z","iopub.status.idle":"2024-04-27T15:54:13.024830Z","shell.execute_reply.started":"2024-04-27T15:54:00.989195Z","shell.execute_reply":"2024-04-27T15:54:13.023870Z"},"trusted":true},"execution_count":1,"outputs":[{"name":"stderr","text":"/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3\n warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n","output_type":"stream"}]},{"cell_type":"markdown","source":"# Read The Dataset","metadata":{"id":"0be1eef6-c94a-4a5f-a4bc-0e8f38100062"}},{"cell_type":"markdown","source":"1. **Defining the Data Directory**:\n - The code defines a variable called `data_dir` and assigns it the value `r'G:/College/Graduation Project/Dataset'`. This variable specifies the directory where the data file is locat\n2. **Constructing the File Path**:\n - The code constructs the full path to the data file by concatenating the `data_dir` variable with the file name `'Recipes.xlsx'`. This results in the variable `file_name` containing the complete path to the Excel e.\n3. **Reading the Excel File**:\n - The code uses the `pd.read_excel()` function to read the Excel file specified by the `file_name` variable. This function loads the data from the Excel file into a Pandas DataFrame, which is a tabular data structure. The resulting DataFrame is stored in the variable `dataseaw`.\n4. **Displaying the First 25 Rows**:\n - Finally, the code uses the `head()` method on the `dataset_raw` DataFrame to display the first 25 rows of the data. This provides a preview of the data contained in the Exf the data.","metadata":{"id":"fad5dc45-89eb-4c44-b57e-d76085254c20"}},{"cell_type":"code","source":"dataset_raw = pd.read_csv(r'/kaggle/input/recipe-dataset-over-2m/recipes_data.csv')\ndataset_raw = dataset_raw [:75000]\ndataset_raw.head(25)","metadata":{"id":"8152512b-7b74-4d10-8e6e-3e4c49fea001","outputId":"cf416071-8877-47dc-9970-7d7e1f1fcdee","execution":{"iopub.status.busy":"2024-04-27T15:54:13.027132Z","iopub.execute_input":"2024-04-27T15:54:13.028164Z","iopub.status.idle":"2024-04-27T15:55:01.910031Z","shell.execute_reply.started":"2024-04-27T15:54:13.028126Z","shell.execute_reply":"2024-04-27T15:55:01.908970Z"},"trusted":true},"execution_count":2,"outputs":[{"execution_count":2,"output_type":"execute_result","data":{"text/plain":" title \\\n0 No-Bake Nut Cookies \n1 Jewell Ball'S Chicken \n2 Creamy Corn \n3 Chicken Funny \n4 Reeses Cups(Candy) \n5 Cheeseburger Potato Soup \n6 Rhubarb Coffee Cake \n7 Scalloped Corn \n8 Nolan'S Pepper Steak \n9 Millionaire Pie \n10 Double Cherry Delight \n11 Buckeye Candy \n12 Quick Barbecue Wings \n13 Taco Salad Chip Dip \n14 Pink Stuff(Frozen Dessert) \n15 Fresh Strawberry Pie \n16 Easy German Chocolate Cake \n17 Broccoli Salad \n18 Strawberry Whatever \n19 Eggless Milkless Applesauce Cake \n20 Grandma Hanrath'S Banana Breadfort Collins, Co... \n21 Chocolate Frango Mints \n22 Cuddy Farms Marinated Turkey \n23 Spaghetti Sauce To Can \n24 Prize-Winning Meat Loaf \n\n ingredients \\\n0 [\"1 c. firmly packed brown sugar\", \"1/2 c. eva... \n1 [\"1 small jar chipped beef, cut up\", \"4 boned ... \n2 [\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg... \n3 [\"1 large whole chicken\", \"2 (10 1/2 oz.) cans... \n4 [\"1 c. peanut butter\", \"3/4 c. graham cracker ... \n5 [\"6 baking potatoes\", \"1 lb. of extra lean gro... \n6 [\"1 1/2 c. sugar\", \"1/2 c. butter\", \"1 egg\", \"... \n7 [\"1 can cream-style corn\", \"1 can whole kernel... \n8 [\"1 1/2 lb. round steak (1-inch thick), cut in... \n9 [\"1 large container Cool Whip\", \"1 large can c... \n10 [\"1 (17 oz.) can dark sweet pitted cherries\", ... \n11 [\"1 box powdered sugar\", \"8 oz. soft butter\", ... \n12 [\"chicken wings (as many as you need for dinne... \n13 [\"8 oz. Ortega taco sauce\", \"8 oz. sour cream\"... \n14 [\"1 can pie filling (cherry or strawberry)\", \"... \n15 [\"1 baked pie shell\", \"1 qt. cleaned strawberr... \n16 [\"1/2 pkg. chocolate fudge cake mix without pu... \n17 [\"1 large head broccoli (about 1 1/2 lb.)\", \"1... \n18 [\"1 lb. frozen strawberries in juice\", \"1 smal... \n19 [\"3/4 c. sugar\", \"1/2 c. shortening\", \"1 1/2 c... \n20 [\"1 c. sugar\", \"1/2 c. shortening\", \"2 eggs (a... \n21 [\"1 pkg. devil's food cake mix\", \"1 pkg. choco... \n22 [\"2 c. 7-Up or Sprite\", \"1 c. vegetable oil\", ... \n23 [\"1/2 bushel tomatoes\", \"1 c. oil\", \"1/4 c. mi... \n24 [\"1 1/2 lb. ground beef\", \"1 c. tomato juice\",... \n\n directions \\\n0 [\"In a heavy 2-quart saucepan, mix brown sugar... \n1 [\"Place chipped beef on bottom of baking dish.... \n2 [\"In a slow cooker, combine all ingredients. C... \n3 [\"Boil and debone chicken.\", \"Put bite size pi... \n4 [\"Combine first four ingredients and press in ... \n5 [\"Wash potatoes; prick several times with a fo... \n6 [\"Cream sugar and butter.\", \"Add egg and beat ... \n7 [\"Mix together both cans of corn, crackers, eg... \n8 [\"Roll steak strips in flour.\", \"Brown in skil... \n9 [\"Empty Cool Whip into a bowl.\", \"Drain juice ... \n10 [\"Drain cherries, measuring syrup.\", \"Cut cher... \n11 [\"Mix sugar, butter and peanut butter.\", \"Roll... \n12 [\"Clean wings.\", \"Flour and fry until done.\", ... \n13 [\"Mix taco sauce, sour cream and cream cheese.... \n14 [\"Mix all ingredients together.\", \"Pour into a... \n15 [\"Mix water, cornstarch, sugar and salt in sau... \n16 [\"Mix according to directions and add oil.\", \"... \n17 [\"Trim off large leaves of broccoli and remove... \n18 [\"Mix Jell-O in boiling water.\", \"Add strawber... \n19 [\"Mix Crisco with applesauce, nuts and raisins... \n20 [\"Cream sugar and shortening.\", \"Add eggs, sal... \n21 [\"Mix ingredients together for 5 minutes.\", \"S... \n22 [\"Buy whole turkey breast; remove all skin and... \n23 [\"Cook ground or chopped peppers and onions in... \n24 [\"Mix well.\", \"Press firmly into an 8 1/2 x 4 ... \n\n link source \\\n0 www.cookbooks.com/Recipe-Details.aspx?id=44874 Gathered \n1 www.cookbooks.com/Recipe-Details.aspx?id=699419 Gathered \n2 www.cookbooks.com/Recipe-Details.aspx?id=10570 Gathered \n3 www.cookbooks.com/Recipe-Details.aspx?id=897570 Gathered \n4 www.cookbooks.com/Recipe-Details.aspx?id=659239 Gathered \n5 www.cookbooks.com/Recipe-Details.aspx?id=20115 Gathered \n6 www.cookbooks.com/Recipe-Details.aspx?id=210288 Gathered \n7 www.cookbooks.com/Recipe-Details.aspx?id=876969 Gathered \n8 www.cookbooks.com/Recipe-Details.aspx?id=375254 Gathered \n9 www.cookbooks.com/Recipe-Details.aspx?id=794547 Gathered \n10 www.cookbooks.com/Recipe-Details.aspx?id=703381 Gathered \n11 www.cookbooks.com/Recipe-Details.aspx?id=886785 Gathered \n12 www.cookbooks.com/Recipe-Details.aspx?id=768311 Gathered \n13 www.cookbooks.com/Recipe-Details.aspx?id=806409 Gathered \n14 www.cookbooks.com/Recipe-Details.aspx?id=982483 Gathered \n15 www.cookbooks.com/Recipe-Details.aspx?id=161321 Gathered \n16 www.cookbooks.com/Recipe-Details.aspx?id=983179 Gathered \n17 www.cookbooks.com/Recipe-Details.aspx?id=50992 Gathered \n18 www.cookbooks.com/Recipe-Details.aspx?id=718063 Gathered \n19 www.cookbooks.com/Recipe-Details.aspx?id=343158 Gathered \n20 www.cookbooks.com/Recipe-Details.aspx?id=1072247 Gathered \n21 www.cookbooks.com/Recipe-Details.aspx?id=958474 Gathered \n22 www.cookbooks.com/Recipe-Details.aspx?id=9449 Gathered \n23 www.cookbooks.com/Recipe-Details.aspx?id=1059279 Gathered \n24 www.cookbooks.com/Recipe-Details.aspx?id=923674 Gathered \n\n NER site \n0 [\"bite size shredded rice biscuits\", \"vanilla\"... www.cookbooks.com \n1 [\"cream of mushroom soup\", \"beef\", \"sour cream... www.cookbooks.com \n2 [\"frozen corn\", \"pepper\", \"cream cheese\", \"gar... www.cookbooks.com \n3 [\"chicken gravy\", \"cream of mushroom soup\", \"c... www.cookbooks.com \n4 [\"graham cracker crumbs\", \"powdered sugar\", \"p... www.cookbooks.com \n5 [\"sour cream\", \"bacon\", \"pepper\", \"extra lean ... www.cookbooks.com \n6 [\"buttermilk\", \"egg\", \"sugar\", \"vanilla\", \"sod... www.cookbooks.com \n7 [\"egg\", \"pepper\", \"crackers\", \"cream-style cor... www.cookbooks.com \n8 [\"oil\", \"tomatoes\", \"green peppers\", \"water\", ... www.cookbooks.com \n9 [\"condensed milk\", \"lemons\", \"graham cracker c... www.cookbooks.com \n10 [\"flavor gelatin\", \"dark sweet pitted cherries... www.cookbooks.com \n11 [\"paraffin\", \"powdered sugar\", \"peanut butter\"... www.cookbooks.com \n12 [\"flour\", \"chicken\", \"barbecue sauce\"] www.cookbooks.com \n13 [\"sour cream\", \"tomato\", \"shredded lettuce\", \"... www.cookbooks.com \n14 [\"condensed milk\", \"pie filling\", \"pineapple\",... www.cookbooks.com \n15 [\"sugar\", \"shell\", \"water\", \"cleaned strawberr... www.cookbooks.com \n16 [\"chocolate fudge cake\", \"white cake\", \"wesson... www.cookbooks.com \n17 [\"bacon\", \"vinegar\", \"sugar\", \"green onions\", ... www.cookbooks.com \n18 [\"sour cream\", \"frozen strawberries\", \"strawbe... www.cookbooks.com \n19 [\"sugar\", \"shortening\", \"cinnamon\", \"soda\", \"a... www.cookbooks.com \n20 [\"sugar\", \"shortening\", \"eggs\", \"soda\", \"banan... www.cookbooks.com \n21 [\"sour cream\", \"frango\", \"cake mix\", \"eggs\", \"... www.cookbooks.com \n22 [\"garlic\", \"vegetable oil\", \"soy sauce\"] www.cookbooks.com \n23 [\"oil\", \"oregano\", \"sugar\", \"tomatoes\", \"tomat... www.cookbooks.com \n24 [\"egg\", \"tomato juice\", \"pepper\", \"oats\", \"oni... www.cookbooks.com ","text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
titleingredientsdirectionslinksourceNERsite
0No-Bake Nut Cookies[\"1 c. firmly packed brown sugar\", \"1/2 c. eva...[\"In a heavy 2-quart saucepan, mix brown sugar...www.cookbooks.com/Recipe-Details.aspx?id=44874Gathered[\"bite size shredded rice biscuits\", \"vanilla\"...www.cookbooks.com
1Jewell Ball'S Chicken[\"1 small jar chipped beef, cut up\", \"4 boned ...[\"Place chipped beef on bottom of baking dish....www.cookbooks.com/Recipe-Details.aspx?id=699419Gathered[\"cream of mushroom soup\", \"beef\", \"sour cream...www.cookbooks.com
2Creamy Corn[\"2 (16 oz.) pkg. frozen corn\", \"1 (8 oz.) pkg...[\"In a slow cooker, combine all ingredients. C...www.cookbooks.com/Recipe-Details.aspx?id=10570Gathered[\"frozen corn\", \"pepper\", \"cream cheese\", \"gar...www.cookbooks.com
3Chicken Funny[\"1 large whole chicken\", \"2 (10 1/2 oz.) cans...[\"Boil and debone chicken.\", \"Put bite size pi...www.cookbooks.com/Recipe-Details.aspx?id=897570Gathered[\"chicken gravy\", \"cream of mushroom soup\", \"c...www.cookbooks.com
4Reeses Cups(Candy)[\"1 c. peanut butter\", \"3/4 c. graham cracker ...[\"Combine first four ingredients and press in ...www.cookbooks.com/Recipe-Details.aspx?id=659239Gathered[\"graham cracker crumbs\", \"powdered sugar\", \"p...www.cookbooks.com
5Cheeseburger Potato Soup[\"6 baking potatoes\", \"1 lb. of extra lean gro...[\"Wash potatoes; prick several times with a fo...www.cookbooks.com/Recipe-Details.aspx?id=20115Gathered[\"sour cream\", \"bacon\", \"pepper\", \"extra lean ...www.cookbooks.com
6Rhubarb Coffee Cake[\"1 1/2 c. sugar\", \"1/2 c. butter\", \"1 egg\", \"...[\"Cream sugar and butter.\", \"Add egg and beat ...www.cookbooks.com/Recipe-Details.aspx?id=210288Gathered[\"buttermilk\", \"egg\", \"sugar\", \"vanilla\", \"sod...www.cookbooks.com
7Scalloped Corn[\"1 can cream-style corn\", \"1 can whole kernel...[\"Mix together both cans of corn, crackers, eg...www.cookbooks.com/Recipe-Details.aspx?id=876969Gathered[\"egg\", \"pepper\", \"crackers\", \"cream-style cor...www.cookbooks.com
8Nolan'S Pepper Steak[\"1 1/2 lb. round steak (1-inch thick), cut in...[\"Roll steak strips in flour.\", \"Brown in skil...www.cookbooks.com/Recipe-Details.aspx?id=375254Gathered[\"oil\", \"tomatoes\", \"green peppers\", \"water\", ...www.cookbooks.com
9Millionaire Pie[\"1 large container Cool Whip\", \"1 large can c...[\"Empty Cool Whip into a bowl.\", \"Drain juice ...www.cookbooks.com/Recipe-Details.aspx?id=794547Gathered[\"condensed milk\", \"lemons\", \"graham cracker c...www.cookbooks.com
10Double Cherry Delight[\"1 (17 oz.) can dark sweet pitted cherries\", ...[\"Drain cherries, measuring syrup.\", \"Cut cher...www.cookbooks.com/Recipe-Details.aspx?id=703381Gathered[\"flavor gelatin\", \"dark sweet pitted cherries...www.cookbooks.com
11Buckeye Candy[\"1 box powdered sugar\", \"8 oz. soft butter\", ...[\"Mix sugar, butter and peanut butter.\", \"Roll...www.cookbooks.com/Recipe-Details.aspx?id=886785Gathered[\"paraffin\", \"powdered sugar\", \"peanut butter\"...www.cookbooks.com
12Quick Barbecue Wings[\"chicken wings (as many as you need for dinne...[\"Clean wings.\", \"Flour and fry until done.\", ...www.cookbooks.com/Recipe-Details.aspx?id=768311Gathered[\"flour\", \"chicken\", \"barbecue sauce\"]www.cookbooks.com
13Taco Salad Chip Dip[\"8 oz. Ortega taco sauce\", \"8 oz. sour cream\"...[\"Mix taco sauce, sour cream and cream cheese....www.cookbooks.com/Recipe-Details.aspx?id=806409Gathered[\"sour cream\", \"tomato\", \"shredded lettuce\", \"...www.cookbooks.com
14Pink Stuff(Frozen Dessert)[\"1 can pie filling (cherry or strawberry)\", \"...[\"Mix all ingredients together.\", \"Pour into a...www.cookbooks.com/Recipe-Details.aspx?id=982483Gathered[\"condensed milk\", \"pie filling\", \"pineapple\",...www.cookbooks.com
15Fresh Strawberry Pie[\"1 baked pie shell\", \"1 qt. cleaned strawberr...[\"Mix water, cornstarch, sugar and salt in sau...www.cookbooks.com/Recipe-Details.aspx?id=161321Gathered[\"sugar\", \"shell\", \"water\", \"cleaned strawberr...www.cookbooks.com
16Easy German Chocolate Cake[\"1/2 pkg. chocolate fudge cake mix without pu...[\"Mix according to directions and add oil.\", \"...www.cookbooks.com/Recipe-Details.aspx?id=983179Gathered[\"chocolate fudge cake\", \"white cake\", \"wesson...www.cookbooks.com
17Broccoli Salad[\"1 large head broccoli (about 1 1/2 lb.)\", \"1...[\"Trim off large leaves of broccoli and remove...www.cookbooks.com/Recipe-Details.aspx?id=50992Gathered[\"bacon\", \"vinegar\", \"sugar\", \"green onions\", ...www.cookbooks.com
18Strawberry Whatever[\"1 lb. frozen strawberries in juice\", \"1 smal...[\"Mix Jell-O in boiling water.\", \"Add strawber...www.cookbooks.com/Recipe-Details.aspx?id=718063Gathered[\"sour cream\", \"frozen strawberries\", \"strawbe...www.cookbooks.com
19Eggless Milkless Applesauce Cake[\"3/4 c. sugar\", \"1/2 c. shortening\", \"1 1/2 c...[\"Mix Crisco with applesauce, nuts and raisins...www.cookbooks.com/Recipe-Details.aspx?id=343158Gathered[\"sugar\", \"shortening\", \"cinnamon\", \"soda\", \"a...www.cookbooks.com
20Grandma Hanrath'S Banana Breadfort Collins, Co...[\"1 c. sugar\", \"1/2 c. shortening\", \"2 eggs (a...[\"Cream sugar and shortening.\", \"Add eggs, sal...www.cookbooks.com/Recipe-Details.aspx?id=1072247Gathered[\"sugar\", \"shortening\", \"eggs\", \"soda\", \"banan...www.cookbooks.com
21Chocolate Frango Mints[\"1 pkg. devil's food cake mix\", \"1 pkg. choco...[\"Mix ingredients together for 5 minutes.\", \"S...www.cookbooks.com/Recipe-Details.aspx?id=958474Gathered[\"sour cream\", \"frango\", \"cake mix\", \"eggs\", \"...www.cookbooks.com
22Cuddy Farms Marinated Turkey[\"2 c. 7-Up or Sprite\", \"1 c. vegetable oil\", ...[\"Buy whole turkey breast; remove all skin and...www.cookbooks.com/Recipe-Details.aspx?id=9449Gathered[\"garlic\", \"vegetable oil\", \"soy sauce\"]www.cookbooks.com
23Spaghetti Sauce To Can[\"1/2 bushel tomatoes\", \"1 c. oil\", \"1/4 c. mi...[\"Cook ground or chopped peppers and onions in...www.cookbooks.com/Recipe-Details.aspx?id=1059279Gathered[\"oil\", \"oregano\", \"sugar\", \"tomatoes\", \"tomat...www.cookbooks.com
24Prize-Winning Meat Loaf[\"1 1/2 lb. ground beef\", \"1 c. tomato juice\",...[\"Mix well.\", \"Press firmly into an 8 1/2 x 4 ...www.cookbooks.com/Recipe-Details.aspx?id=923674Gathered[\"egg\", \"tomato juice\", \"pepper\", \"oats\", \"oni...www.cookbooks.com
\n
"},"metadata":{}}]},{"cell_type":"markdown","source":"**Calling the `info()` Method**:\n - The `info()` method is called on the `dataset_raw` DataFrame. This method provides concise summary information about the DataFrame, including the number of rows, columns, data types, and memory usa\n**Displaying the DataFrame Information**:\n - The output of the `info()` method is displayed, providing insights into the structure and contents of the DataFrame. This information can be useful for understanding the data and making informed decisions about further processing or ana","metadata":{"id":"defbfb93-7eae-4f5a-b9e3-d6f779f1ee35"}},{"cell_type":"code","source":"dataset_raw.info()","metadata":{"id":"5ce82336-c095-4d95-9cc9-ec58914cea3c","outputId":"623fffc3-4b80-453c-8945-29be282101cb","execution":{"iopub.status.busy":"2024-04-27T15:55:01.911201Z","iopub.execute_input":"2024-04-27T15:55:01.911482Z","iopub.status.idle":"2024-04-27T15:55:01.988408Z","shell.execute_reply.started":"2024-04-27T15:55:01.911457Z","shell.execute_reply":"2024-04-27T15:55:01.987465Z"},"trusted":true},"execution_count":3,"outputs":[{"name":"stdout","text":"\nRangeIndex: 75000 entries, 0 to 74999\nData columns (total 7 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 title 75000 non-null object\n 1 ingredients 75000 non-null object\n 2 directions 75000 non-null object\n 3 link 75000 non-null object\n 4 source 75000 non-null object\n 5 NER 75000 non-null object\n 6 site 75000 non-null object\ndtypes: object(7)\nmemory usage: 4.0+ MB\n","output_type":"stream"}]},{"cell_type":"markdown","source":"## Preprocessing The Data","metadata":{"id":"a50ee7a0-6b81-4ded-b6e9-bd4d0c3c6345"}},{"cell_type":"markdown","source":"### Converting recipes objects into strings","metadata":{"id":"37781ca1-e88d-4d1e-9998-91b372b75593"}},{"cell_type":"markdown","source":"\nThe code dataset_validated = [recipe for recipe in dataset_raw.iterrows()] is using a list comprehension to create a new list called dataset_validated from the dataset_raw DataFrame.\n\ndataset_raw.iterrows() returns an iterator that generates a tuple for each row in the DataFrame, where the first element of the tuple is the index of the row and the second element is the row itself as a Series object.\n\nThe list comprehension then iterates over each tuple in the iterator and extracts the row Series object, which is assigned to the variable recipe. This variable is then used in the list comprehension to create a new list dataset_validated.\n\nSo, dataset_validated will be a list of Series objects, where each Series object corresponds to a row in the original dataset_raw DataFrame.\n","metadata":{"id":"iFU9H2RERKzU"}},{"cell_type":"code","source":"dataset_validated = [recipe for recipe in dataset_raw.iterrows()]\n","metadata":{"id":"SSWY6cWiPokz","execution":{"iopub.status.busy":"2024-04-27T15:55:01.991258Z","iopub.execute_input":"2024-04-27T15:55:01.991860Z","iopub.status.idle":"2024-04-27T15:55:07.401887Z","shell.execute_reply.started":"2024-04-27T15:55:01.991831Z","shell.execute_reply":"2024-04-27T15:55:07.401075Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":" The code defines three constants. Each constant is assigned a string value. The first constant, `STOP_WORD_TITLE`, is assigned the string value `'📕 '`. The second constant, `STOP_WORD_INGREDIENTS`, is assigned the string value `'\\n🥩\\n\\n'`. The third constant, `STOP_WORD_INSTRUCTIONS`, is assigned the string value `'\\n✍️\\n\\n'`.\nThe purpose of these constants is not immediately clear from the code itself. However, it is possible that they are used as delimiters in a larger program. For example, they could be used to separate different sections of a recipe, such as the title, ingredients, and instructions.\nWithout more context, it is difficult to say for sure what these constants are used for. However, the names of the constants suggest that they are related to a recipe or cooking instructions.","metadata":{"id":"428250f5-c4a9-43ae-bf27-cd3f8f2db9ba"}},{"cell_type":"code","source":"STOP_WORD_TITLE = '📕 '\nSTOP_WORD_INGREDIENTS = '\\n🥩\\n\\n'\nSTOP_WORD_INSTRUCTIONS = '\\n✍️\\n\\n'","metadata":{"id":"2cec2569-9c70-4c7d-b1c3-6c11d202446b","execution":{"iopub.status.busy":"2024-04-27T15:55:07.403247Z","iopub.execute_input":"2024-04-27T15:55:07.403542Z","iopub.status.idle":"2024-04-27T15:55:07.407838Z","shell.execute_reply.started":"2024-04-27T15:55:07.403516Z","shell.execute_reply":"2024-04-27T15:55:07.406946Z"},"trusted":true},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":"The function called `recipe_to_string` that takes a recipe as input and converts it into a string representation.\nThe function first extracts the recipe name, ingredients, and directions from the recipe dictionary.\nNext, the function creates two empty strings, `ingredients_string` and `instructions_string`, which will be used to store the formatted ingredients and instructions.\nThe function then iterates over the ingredients and instructions lists, formatting each item and adding it to the respective string.\nFinally, the function returns a string that combines the formatted title, ingredients, and instructions, using the `STOP_WORD_TITLE`, `STOP_WORD_INGREDIENTS`, and `STOP_WORD_INSTRUCTIONS` constants as delimiters.\nThe code then uses the `recipe_to_string` function to convert each recipe in the `dataset_validated` list into a string representation.\nThe resulting list of strings is stored in the `dataset_stringified` variable.\nFinally, the code prints the first five recipes in the `dataset_stringified` list, along with their recipe numbers and a separator line.","metadata":{"id":"a5fffb24-8bc6-4a4e-89d6-7965e4a293df"}},{"cell_type":"code","source":"def recipe_to_string(recipe):\n recipe = recipe[1]\n \n title = recipe['title']\n ingredients = recipe['ingredients']\n instructions = recipe['directions']\n\n ingredients_string = ''\n\n for ingredient in ingredients.strip(\"[]\").split(', '):\n if ingredient:\n ingredient = ingredient.replace(\"'\", \"\")\n ingredients_string += f'• {ingredient}\\n'\n\n instructions_string = ''\n for instruction in instructions.strip('][').split(', '):\n if instruction:\n instruction = instruction.replace(\"'\", \"\")\n instructions_string += f'▪︎ {instruction}\\n'\n\n return f'{STOP_WORD_TITLE}{title}\\n{STOP_WORD_INGREDIENTS}{ingredients_string}{STOP_WORD_INSTRUCTIONS}{instructions_string}'\n\ndataset_stringified = [recipe_to_string(recipe) for recipe in dataset_validated]\n\nfor recipe_index, recipe_string in enumerate(dataset_stringified[:5]):\n print('Recipe #{}\\n---------'.format(recipe_index + 1))\n print(recipe_string)\n print('\\n')","metadata":{"id":"df7e335d-29fe-4002-bc13-ed22a920aea3","outputId":"b18f9225-92fe-4376-b87b-c855cc204606","execution":{"iopub.status.busy":"2024-04-27T15:55:07.409177Z","iopub.execute_input":"2024-04-27T15:55:07.409443Z","iopub.status.idle":"2024-04-27T15:55:09.548465Z","shell.execute_reply.started":"2024-04-27T15:55:07.409420Z","shell.execute_reply":"2024-04-27T15:55:09.547485Z"},"trusted":true},"execution_count":6,"outputs":[{"name":"stdout","text":"Recipe #1\n---------\n📕 No-Bake Nut Cookies\n\n🥩\n\n• \"1 c. firmly packed brown sugar\"\n• \"1/2 c. evaporated milk\"\n• \"1/2 tsp. vanilla\"\n• \"1/2 c. broken nuts (pecans)\"\n• \"2 Tbsp. butter or margarine\"\n• \"3 1/2 c. bite size shredded rice biscuits\"\n\n✍️\n\n▪︎ \"In a heavy 2-quart saucepan\n▪︎ mix brown sugar\n▪︎ nuts\n▪︎ evaporated milk and butter or margarine.\"\n▪︎ \"Stir over medium heat until mixture bubbles all over top.\"\n▪︎ \"Boil and stir 5 minutes more. Take off heat.\"\n▪︎ \"Stir in vanilla and cereal; mix well.\"\n▪︎ \"Using 2 teaspoons\n▪︎ drop and shape into 30 clusters on wax paper.\"\n▪︎ \"Let stand until firm\n▪︎ about 30 minutes.\"\n\n\n\nRecipe #2\n---------\n📕 Jewell Ball'S Chicken\n\n🥩\n\n• \"1 small jar chipped beef\n• cut up\"\n• \"4 boned chicken breasts\"\n• \"1 can cream of mushroom soup\"\n• \"1 carton sour cream\"\n\n✍️\n\n▪︎ \"Place chipped beef on bottom of baking dish.\"\n▪︎ \"Place chicken on top of beef.\"\n▪︎ \"Mix soup and cream together; pour over chicken. Bake\n▪︎ uncovered\n▪︎ at 275\\u00b0 for 3 hours.\"\n\n\n\nRecipe #3\n---------\n📕 Creamy Corn\n\n🥩\n\n• \"2 (16 oz.) pkg. frozen corn\"\n• \"1 (8 oz.) pkg. cream cheese\n• cubed\"\n• \"1/3 c. butter\n• cubed\"\n• \"1/2 tsp. garlic powder\"\n• \"1/2 tsp. salt\"\n• \"1/4 tsp. pepper\"\n\n✍️\n\n▪︎ \"In a slow cooker\n▪︎ combine all ingredients. Cover and cook on low for 4 hours or until heated through and cheese is melted. Stir well before serving. Yields 6 servings.\"\n\n\n\nRecipe #4\n---------\n📕 Chicken Funny\n\n🥩\n\n• \"1 large whole chicken\"\n• \"2 (10 1/2 oz.) cans chicken gravy\"\n• \"1 (10 1/2 oz.) can cream of mushroom soup\"\n• \"1 (6 oz.) box Stove Top stuffing\"\n• \"4 oz. shredded cheese\"\n\n✍️\n\n▪︎ \"Boil and debone chicken.\"\n▪︎ \"Put bite size pieces in average size square casserole dish.\"\n▪︎ \"Pour gravy and cream of mushroom soup over chicken; level.\"\n▪︎ \"Make stuffing according to instructions on box (do not make too moist).\"\n▪︎ \"Put stuffing on top of chicken and gravy; level.\"\n▪︎ \"Sprinkle shredded cheese on top and bake at 350\\u00b0 for approximately 20 minutes or until golden and bubbly.\"\n\n\n\nRecipe #5\n---------\n📕 Reeses Cups(Candy) \n\n🥩\n\n• \"1 c. peanut butter\"\n• \"3/4 c. graham cracker crumbs\"\n• \"1 c. melted butter\"\n• \"1 lb. (3 1/2 c.) powdered sugar\"\n• \"1 large pkg. chocolate chips\"\n\n✍️\n\n▪︎ \"Combine first four ingredients and press in 13 x 9-inch ungreased pan.\"\n▪︎ \"Melt chocolate chips and spread over mixture. Refrigerate for about 20 minutes and cut into pieces before chocolate gets hard.\"\n▪︎ \"Keep in refrigerator.\"\n\n\n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This line prints the value of the 801st element of the `dataset_stringified` list to the console.","metadata":{"id":"76dfcf54-199a-4af7-a52c-4ab12e39ba9d"}},{"cell_type":"code","source":"print(dataset_stringified[800])","metadata":{"id":"7cdfa2b8-0508-4a2e-8781-be225541c070","outputId":"1e438362-e944-4f7e-b86b-b2e28ad870a3","execution":{"iopub.status.busy":"2024-04-27T15:55:09.550026Z","iopub.execute_input":"2024-04-27T15:55:09.550405Z","iopub.status.idle":"2024-04-27T15:55:09.555631Z","shell.execute_reply.started":"2024-04-27T15:55:09.550370Z","shell.execute_reply":"2024-04-27T15:55:09.554559Z"},"trusted":true},"execution_count":7,"outputs":[{"name":"stdout","text":"📕 Finger Jello\n\n🥩\n\n• \"4 boxes of small jello\"\n• \"4 pkg. gelatin\"\n\n✍️\n\n▪︎ \"Mix together and add 5 c. boiling water. Dissolve completely. Let set in refrigerator.\"\n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Filtering out large & small receipes","metadata":{"id":"b9261b9b-1eb2-47a5-b8ac-08a418ebbe7d"}},{"cell_type":"markdown","source":"\n1. **recipes_lengths = []**: This line initializes an empty list called `recipes_lengths`. This list will be used to store the lengths of the recipes.\n2. **for recipe_text in dataset_stringified**: This line starts a `for` loop that iterates over each element in the `dataset_stringified` list. Each element in this list is a string that represents a recipe.\n3. **len(recipe_text)**: Inside the loop, this expression calculates the length of the current recipe string. The `len()` function returns the number of characters in a string.\n4. **recipes_lengths.append(len(recipe_text))**: This line appends the length of the current recipe to the `recipes_lengths` list. The `append()` method adds an element to the end of a list.","metadata":{"id":"3764488d-93b4-4de2-b9df-8ad218f0ff64"}},{"cell_type":"code","source":"recipes_lengths = []\nfor recipe_text in dataset_stringified:\n recipes_lengths.append(len(recipe_text))","metadata":{"id":"c8e3a3e3-ebf6-4a30-b108-4751ba20dd57","execution":{"iopub.status.busy":"2024-04-27T15:55:09.556968Z","iopub.execute_input":"2024-04-27T15:55:09.557556Z","iopub.status.idle":"2024-04-27T15:55:09.594651Z","shell.execute_reply.started":"2024-04-27T15:55:09.557521Z","shell.execute_reply":"2024-04-27T15:55:09.593709Z"},"trusted":true},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":"1. **plt.hist(recipes_lengths, bins=50)**: This line creates a histogram of the recipe lengths. The `plt.hist()` function takes two main arguments: the data to be plotted (in this case, `recipes_lengths`) and the number of bins to use (in this case, 50). The number of bins determines how many bars will be used in the histogram.\n2. **plt.show()**: This line displays the histogram on the screen.","metadata":{"id":"a7e8007b-f254-4543-9fe1-67007b40a114"}},{"cell_type":"code","source":"plt.hist(recipes_lengths, bins=50)\nplt.show()","metadata":{"id":"0b7bb5df-461a-4030-8e51-0455e6d036ef","outputId":"f0ee1fd8-32df-421a-feca-ee873cf61562","execution":{"iopub.status.busy":"2024-04-27T15:55:09.595860Z","iopub.execute_input":"2024-04-27T15:55:09.596212Z","iopub.status.idle":"2024-04-27T15:55:10.253593Z","shell.execute_reply.started":"2024-04-27T15:55:09.596178Z","shell.execute_reply":"2024-04-27T15:55:10.252645Z"},"trusted":true},"execution_count":9,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"1. **plt.hist(recipes_lengths,range=(0,2000), bins=50)**: This line creates a histogram of the recipe lengths. The `plt.hist()` function takes three main arguments: the data to be plotted (in this case, `recipes_lengths`), the range of values to be plotted (in this case, 0 to 2000), and the number of bins to use (in this case, 50). The number of bins determines how many bars will be used in the histogram.\n2. **plt.show()**: This line displays the histogram on the screen.","metadata":{"id":"6481bb84-181b-4f64-ae27-fb782e5080ed"}},{"cell_type":"code","source":"plt.hist(recipes_lengths,range=(0,2000), bins=50)\nplt.show()","metadata":{"id":"0d46e90b-7345-4555-ba86-66b269dbcd3d","outputId":"919075b7-71cd-45b3-cae5-85958cbeacef","execution":{"iopub.status.busy":"2024-04-27T15:55:10.256939Z","iopub.execute_input":"2024-04-27T15:55:10.257297Z","iopub.status.idle":"2024-04-27T15:55:10.835336Z","shell.execute_reply.started":"2024-04-27T15:55:10.257272Z","shell.execute_reply":"2024-04-27T15:55:10.834263Z"},"trusted":true},"execution_count":10,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"**This is a constant named `MAX_RECIPE_LENGTH` and assigns it the value 750.\nThis constant is likely used to specify the maximum length of a recipe in some context**","metadata":{"id":"7224a27e-832d-4ca3-80a7-65631a79bef0"}},{"cell_type":"code","source":"MAX_RECIPE_LENGTH = 750","metadata":{"id":"67ec05aa-c770-49ad-b669-22649a150b85","execution":{"iopub.status.busy":"2024-04-27T15:55:10.836879Z","iopub.execute_input":"2024-04-27T15:55:10.837354Z","iopub.status.idle":"2024-04-27T15:55:10.843753Z","shell.execute_reply.started":"2024-04-27T15:55:10.837317Z","shell.execute_reply":"2024-04-27T15:55:10.842922Z"},"trusted":true},"execution_count":11,"outputs":[]},{"cell_type":"markdown","source":"This a function called `filter_max_recipes_by_length()` that takes a recipe text as input and returns `True` if the length of the recipe text is less than or equal to the `MAX_RECIPE_LENGTH` constant, and `False` otherwise.\nThe code then uses this function to filter a list of recipe texts (`dataset_stringified`) and create a new list (`dataset_max_filtered`) that contains only the recipe texts that are less than or equal to `MAX_RECIPE_LENGTH`.\nFinally, the code prints the sizes of the original dataset (`dataset_stringified`) and the filtered dataset (`dataset_max_filtered`), as well as the number of recipes that were eliminated due to not meeting the maximum length requirement.","metadata":{"id":"f3ec3c0c-3f11-4ff7-9a28-31ff27a5af34"}},{"cell_type":"code","source":"def filter_max_recipes_by_length(recipe_test):\n return (len(recipe_test) <= MAX_RECIPE_LENGTH)\n\ndataset_max_filtered = [recipe_text for recipe_text in dataset_stringified if filter_max_recipes_by_length(recipe_text)]\n\nprint('Dataset size BEFORE filtering length: ', len(dataset_stringified))\nprint('Dataset size AFTER filtering length: ', len(dataset_max_filtered))\nprint('Number of eliminated recipes length: ', len(dataset_stringified) - len(dataset_max_filtered))","metadata":{"id":"6aa21610-b7d7-431a-8d18-3f62ab9ab853","outputId":"d29812ae-b733-4c25-c885-bc89fbf7ac94","execution":{"iopub.status.busy":"2024-04-27T15:55:10.844868Z","iopub.execute_input":"2024-04-27T15:55:10.845139Z","iopub.status.idle":"2024-04-27T15:55:10.883941Z","shell.execute_reply.started":"2024-04-27T15:55:10.845116Z","shell.execute_reply":"2024-04-27T15:55:10.883006Z"},"trusted":true},"execution_count":12,"outputs":[{"name":"stdout","text":"Dataset size BEFORE filtering length: 75000\nDataset size AFTER filtering length: 67857\nNumber of eliminated recipes length: 7143\n","output_type":"stream"}]},{"cell_type":"markdown","source":"1. **plt.hist(recipes_lengths,range=(0,750), bins=50)**: This line creates a histogram of the recipe lengths. The `plt.hist()` function takes three main arguments: the data to be plotted (in this case, `recipes_lengths`), the range of values to be plotted (in this case, 0 to 2000), and the number of bins to use (in this case, 50). The number of bins determines how many bars will be used in the histogram.\n2. **plt.show()**: This line displays the histogram on the screen.","metadata":{"id":"Zgd3WC7TSSsz"}},{"cell_type":"code","source":"plt.hist(recipes_lengths,range=(0,750), bins=50)\nplt.show()","metadata":{"id":"7sfxqNpiSFWg","outputId":"c1e782e0-aa99-4266-9b23-d83b59649dba","execution":{"iopub.status.busy":"2024-04-27T15:55:10.885272Z","iopub.execute_input":"2024-04-27T15:55:10.886167Z","iopub.status.idle":"2024-04-27T15:55:11.444848Z","shell.execute_reply.started":"2024-04-27T15:55:10.886099Z","shell.execute_reply":"2024-04-27T15:55:11.443973Z"},"trusted":true},"execution_count":13,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"**This is a constant named MIN_RECIPE_LENGTH and assigns it the value 250.\nThis constant is likely used to specify the minimum length of a recipe in some context**","metadata":{"id":"613c4034-fba8-4aad-81d1-3795b073a52d"}},{"cell_type":"code","source":"MIN_RECIPE_LENGTH = 250","metadata":{"id":"8e0e0791-76d7-49ae-bbc8-83e858dc7ea2","execution":{"iopub.status.busy":"2024-04-27T15:55:11.446136Z","iopub.execute_input":"2024-04-27T15:55:11.446434Z","iopub.status.idle":"2024-04-27T15:55:11.450829Z","shell.execute_reply.started":"2024-04-27T15:55:11.446408Z","shell.execute_reply":"2024-04-27T15:55:11.449845Z"},"trusted":true},"execution_count":14,"outputs":[]},{"cell_type":"markdown","source":"This is a function called `filter_min_recipes_by_length()` that takes a recipe text as input and returns `True` if the length is greater than or equal to the (`MIN_RECIPE_LENGTH`), and `False` otherwise.\nThe code then uses this function to filter a list of recipe texts (`dataset_max_filtered`) and create a new list called (`dataset_filtered`) that contains only the recipe texts that are greater than or equal to `MIN_RECIPE_LENGTH`.\nFinally, the code prints the sizes of the original dataset (`dataset_max_filtered`) and the filtered dataset (`dataset_filtered`), as well as the number of recipes that were eliminated due to not meeting the minimum length requirement.","metadata":{"id":"0a045459-9831-4cbe-b698-9b035ad0c438"}},{"cell_type":"code","source":"def filter_min_recipes_by_length(recipe_test):\n return (len(recipe_test) >= MIN_RECIPE_LENGTH)\n\ndataset_filtered = [recipe_text for recipe_text in dataset_max_filtered if filter_min_recipes_by_length(recipe_text)]\n\nprint('Dataset size BEFORE filtering length: ', len(dataset_max_filtered))\nprint('Dataset size AFTER filtering length: ', len(dataset_filtered))\nprint('Number of eliminated recipes length: ', len(dataset_max_filtered) - len(dataset_filtered))","metadata":{"id":"bbe8e2b9-a82b-4792-b515-360b41fd74c2","outputId":"9f158793-2e43-4fb8-b82c-d833d65f2035","execution":{"iopub.status.busy":"2024-04-27T15:55:11.452105Z","iopub.execute_input":"2024-04-27T15:55:11.452441Z","iopub.status.idle":"2024-04-27T15:55:11.489135Z","shell.execute_reply.started":"2024-04-27T15:55:11.452410Z","shell.execute_reply":"2024-04-27T15:55:11.488197Z"},"trusted":true},"execution_count":15,"outputs":[{"name":"stdout","text":"Dataset size BEFORE filtering length: 67857\nDataset size AFTER filtering length: 63766\nNumber of eliminated recipes length: 4091\n","output_type":"stream"}]},{"cell_type":"markdown","source":"1. **plt.hist(recipes_lengths,range=(250,750), bins=50)**: This line creates a histogram of the recipe lengths. The `plt.hist()` function takes three main arguments: the data to be plotted (in this case, `recipes_lengths`), the range of values to be plotted (in this case, 0 to 2000), and the number of bins to use (in this case, 50). The number of bins determines how many bars will be used in the histogram.\n2. **plt.show()**: This line displays the histogram on the screen to show the range of samples.","metadata":{"id":"eUQQARNeSrTV"}},{"cell_type":"code","source":"plt.hist(recipes_lengths,range=(250,750), bins=50)\nplt.show()","metadata":{"id":"6FkPGLYySeu3","outputId":"0fa8785e-0fe4-4732-f9c8-280fdcbbb600","execution":{"iopub.status.busy":"2024-04-27T15:55:11.490273Z","iopub.execute_input":"2024-04-27T15:55:11.490628Z","iopub.status.idle":"2024-04-27T15:55:12.079258Z","shell.execute_reply.started":"2024-04-27T15:55:11.490593Z","shell.execute_reply":"2024-04-27T15:55:12.078272Z"},"trusted":true},"execution_count":16,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"## Tokenizing Characters","metadata":{"id":"f517437c-1211-46f0-abf4-ae0426d19b26"}},{"cell_type":"markdown","source":"The code defines a constant named `STOP_SIGN` and assigns it the value `␣`. This constant is likely used as an indicator to mark the end of a recipe.","metadata":{"id":"3301638d-aacc-4d14-9304-ca2b5db1b448"}},{"cell_type":"code","source":"# Indicator of the end of the recipe.\nSTOP_SIGN = '␣'","metadata":{"id":"c2b39050-db4d-4f87-9c1d-5ee64433b178","execution":{"iopub.status.busy":"2024-04-27T15:55:12.080308Z","iopub.execute_input":"2024-04-27T15:55:12.080586Z","iopub.status.idle":"2024-04-27T15:55:12.084997Z","shell.execute_reply.started":"2024-04-27T15:55:12.080563Z","shell.execute_reply":"2024-04-27T15:55:12.084058Z"},"trusted":true},"execution_count":17,"outputs":[]},{"cell_type":"markdown","source":" This code is an instantiation of the `Tokenizer` class from the `tf.keras.preprocessing.text` module in TensorFlow. This class is used to vectorize text data, which is a necessary step before training a machine learning model on text data.\nHere's a breakdown of what each line of code does:\n1. `tokenizer = tf.keras.preprocessing.text.Tokenizer(`: This line creates an instance of the `Tokenizer` class.\n2. `char_level=True`: This argument specifies that the tokenizer should tokenize the text at the character level instead of the word level. This means that each character in the text will be considered a separate token.\n3. `filters=''`: This argument specifies the characters that should be removed from the text before tokenization. In this case, no characters will be removed.\n4. `lower=False`: This argument specifies whether the text should be converted to lowercase before tokenization. In this case, the text will not be converted to lowercase.\n5. `split=''`: This argument specifies the delimiter that should be used to split the text into tokens. In this case, no delimiter will be used, which means that the entire text will be considered a single token.\nOnce the tokenizer has been instantiated, it can be used to tokenize text data by calling the `tokenize()` method. The `tokenize()` method takes a list of strings as input and returns a list of lists of integers, where each integer represents the index of a token in the tokenizer's vocabulary.","metadata":{"id":"2d0f0d01-6d38-407c-ab06-8685c6c64bb7"}},{"cell_type":"code","source":"tokenizer = tf.keras.preprocessing.text.Tokenizer(\n char_level=True,\n filters='',\n lower=False,\n split=''\n)","metadata":{"id":"28cba549-2212-4734-80f0-331c125c0950","execution":{"iopub.status.busy":"2024-04-27T15:55:12.086209Z","iopub.execute_input":"2024-04-27T15:55:12.086547Z","iopub.status.idle":"2024-04-27T15:55:12.205760Z","shell.execute_reply.started":"2024-04-27T15:55:12.086515Z","shell.execute_reply":"2024-04-27T15:55:12.204694Z"},"trusted":true},"execution_count":18,"outputs":[]},{"cell_type":"markdown","source":"This code is used to train the tokenizer to recognize the stop sign symbol as a separate token.","metadata":{"id":"a489ba06-a3e8-4a00-9fec-2fde02d852ea"}},{"cell_type":"code","source":"# Stop word is not a part of recipes, but tokenizer must know about it as well.\ntokenizer.fit_on_texts([STOP_SIGN])","metadata":{"id":"dcd428e7-018b-409b-bcb3-cab4db71e7ab","execution":{"iopub.status.busy":"2024-04-27T15:55:12.207277Z","iopub.execute_input":"2024-04-27T15:55:12.207960Z","iopub.status.idle":"2024-04-27T15:55:12.212411Z","shell.execute_reply.started":"2024-04-27T15:55:12.207924Z","shell.execute_reply":"2024-04-27T15:55:12.211497Z"},"trusted":true},"execution_count":19,"outputs":[]},{"cell_type":"markdown","source":"This code is used to train the tokenizer on the `dataset_filtered`. This means that the tokenizer will learn to recognize the words and patterns that appear in these texts and will be able to use this knowledge to tokenize new texts in the future.","metadata":{"id":"a90bfde0-52a7-44cb-a8e5-b5d1f33b452e"}},{"cell_type":"code","source":"tokenizer.fit_on_texts(dataset_filtered)","metadata":{"id":"084a2ef7-f319-434f-b9db-8118bc899960","execution":{"iopub.status.busy":"2024-04-27T15:55:12.213557Z","iopub.execute_input":"2024-04-27T15:55:12.213849Z","iopub.status.idle":"2024-04-27T15:55:21.437039Z","shell.execute_reply.started":"2024-04-27T15:55:12.213824Z","shell.execute_reply":"2024-04-27T15:55:21.436160Z"},"trusted":true},"execution_count":20,"outputs":[]},{"cell_type":"markdown","source":"This function returns the tokenizer configuration as Python dictionary.","metadata":{"id":"b669a1c4-4880-42ed-9a04-f9640644880e"}},{"cell_type":"code","source":"tokenizer.get_config()","metadata":{"id":"20ef8d4a-a623-40d6-801c-9396e24f1d00","outputId":"37f3c399-e8fb-4825-ec1b-faaa596bd7c8","execution":{"iopub.status.busy":"2024-04-27T15:55:21.438218Z","iopub.execute_input":"2024-04-27T15:55:21.438496Z","iopub.status.idle":"2024-04-27T15:55:21.445670Z","shell.execute_reply.started":"2024-04-27T15:55:21.438473Z","shell.execute_reply":"2024-04-27T15:55:21.444641Z"},"trusted":true},"execution_count":21,"outputs":[{"execution_count":21,"output_type":"execute_result","data":{"text/plain":"{'num_words': None,\n 'filters': '',\n 'lower': False,\n 'split': '',\n 'char_level': True,\n 'oov_token': None,\n 'document_count': 63767,\n 'word_counts': '{\"\\\\u2423\": 1, \"\\\\ud83d\\\\udcd5\": 63766, \" \": 4649638, \"N\": 4456, \"o\": 1359407, \"-\": 38957, \"B\": 70103, \"a\": 1489465, \"k\": 283515, \"e\": 2098978, \"u\": 657374, \"t\": 1255684, \"C\": 105566, \"i\": 1212724, \"s\": 1158374, \"\\\\n\": 1391649, \"\\\\ud83e\\\\udd69\": 63766, \"\\\\u2022\": 529389, \"\\\\\"\": 1546526, \"1\": 395131, \"c\": 861788, \".\": 691178, \"f\": 259520, \"r\": 1273815, \"m\": 447692, \"l\": 910392, \"y\": 141222, \"p\": 668438, \"d\": 786658, \"b\": 398291, \"w\": 218181, \"n\": 1281435, \"g\": 491696, \"/\": 147775, \"2\": 207739, \"v\": 150239, \"(\": 58295, \")\": 58285, \"T\": 58103, \"3\": 103176, \"z\": 59824, \"h\": 527464, \"\\\\u270d\": 63766, \"\\\\ufe0f\": 63766, \"\\\\u25aa\": 415898, \"\\\\ufe0e\": 415898, \"I\": 14140, \"q\": 17575, \"x\": 96837, \"S\": 81053, \"5\": 58008, \";\": 23748, \"U\": 2959, \"0\": 154088, \"L\": 12847, \"J\": 5169, \"\\'\": 3882, \"j\": 25952, \"4\": 82881, \"P\": 65337, \"M\": 52964, \"7\": 7394, \"\\\\\\\\\": 33418, \"6\": 19829, \"8\": 23684, \"Y\": 3255, \"F\": 17968, \"R\": 23387, \"9\": 10808, \"K\": 4625, \"A\": 49256, \"D\": 22115, \"W\": 18975, \"E\": 5607, \"O\": 8151, \"Q\": 1314, \"H\": 11563, \"G\": 9997, \":\": 2296, \",\": 1050, \"Z\": 987, \"V\": 3873, \"*\": 1047, \"!\": 1474, \"&\": 446, \"+\": 99, \"X\": 95, \"?\": 102, \"]\": 6, \"#\": 75, \"%\": 235, \"~\": 88, \"`\": 19, \"=\": 12, \"_\": 25, \"|\": 4, \"$\": 7, \"\\\\u00e9\": 2, \"@\": 1, \">\": 1}',\n 'word_docs': '{\"\\\\u2423\": 1, \"g\": 63352, \"a\": 63766, \"-\": 26996, \"1\": 63149, \"T\": 34313, \"h\": 63449, \"\\\\\"\": 63766, \"S\": 45137, \"p\": 63632, \"C\": 51626, \";\": 17538, \"2\": 58787, \"x\": 46208, \"\\\\ufe0f\": 63766, \"t\": 63765, \"\\\\ufe0e\": 63766, \"n\": 63766, \"\\\\u2022\": 63766, \"c\": 63758, \"d\": 63745, \"b\": 62911, \"(\": 33841, \"I\": 11829, \"L\": 10964, \"3\": 48330, \"\\\\u270d\": 63766, \"r\": 63766, \"N\": 4097, \"U\": 2681, \"o\": 63764, \"w\": 59633, \"/\": 51553, \"\\\\u25aa\": 63766, \"l\": 63755, \"s\": 63764, \"u\": 63692, \"i\": 63766, \"e\": 63766, \"y\": 51761, \"f\": 60734, \"q\": 12807, \"\\\\n\": 63766, \"B\": 43158, \")\": 33835, \"\\\\ud83d\\\\udcd5\": 63766, \"z\": 29788, \"v\": 54312, \"k\": 61293, \"\\\\ud83e\\\\udd69\": 63766, \"0\": 37880, \"m\": 63272, \".\": 63765, \" \": 63766, \"5\": 36238, \"7\": 6694, \"j\": 15471, \"J\": 3627, \"M\": 37130, \"\\\\\\\\\": 29898, \"P\": 39552, \"4\": 42598, \"\\'\": 3818, \"6\": 15770, \"8\": 18484, \"Y\": 3112, \"F\": 14655, \"R\": 18038, \"9\": 9958, \"K\": 3590, \"A\": 32792, \"D\": 18309, \"W\": 13663, \"E\": 4926, \"O\": 6314, \"Q\": 1266, \"H\": 9993, \"G\": 8864, \":\": 1916, \",\": 832, \"Z\": 950, \"V\": 3214, \"*\": 605, \"!\": 1256, \"&\": 320, \"+\": 89, \"X\": 87, \"?\": 45, \"]\": 6, \"#\": 70, \"%\": 211, \"~\": 50, \"`\": 19, \"=\": 11, \"_\": 6, \"|\": 3, \"$\": 5, \"\\\\u00e9\": 2, \"@\": 1, \">\": 1}',\n 'index_docs': '{\"1\": 63766, \"98\": 1, \"20\": 63352, \"4\": 63766, \"52\": 26996, \"25\": 63149, \"48\": 34313, \"19\": 63449, \"3\": 63766, \"38\": 45137, \"16\": 63632, \"34\": 51626, \"55\": 17538, \"29\": 58787, \"36\": 46208, \"44\": 63766, \"9\": 63765, \"23\": 63766, \"7\": 63766, \"18\": 63766, \"13\": 63758, \"14\": 63745, \"24\": 62911, \"46\": 33841, \"63\": 11829, \"64\": 10964, \"35\": 48330, \"43\": 63766, \"8\": 63766, \"73\": 4097, \"77\": 2681, \"6\": 63764, \"28\": 59633, \"32\": 51553, \"22\": 63766, \"12\": 63755, \"11\": 63764, \"17\": 63692, \"10\": 63766, \"2\": 63766, \"33\": 51761, \"27\": 60734, \"62\": 12807, \"5\": 63766, \"39\": 43158, \"47\": 33835, \"41\": 63766, \"45\": 29788, \"31\": 54312, \"26\": 61293, \"42\": 63766, \"30\": 37880, \"21\": 63272, \"15\": 63765, \"49\": 36238, \"69\": 6694, \"54\": 15471, \"71\": 3627, \"50\": 37130, \"53\": 29898, \"40\": 39552, \"37\": 42598, \"74\": 3818, \"59\": 15770, \"56\": 18484, \"76\": 3112, \"61\": 14655, \"57\": 18038, \"66\": 9958, \"72\": 3590, \"51\": 32792, \"58\": 18309, \"60\": 13663, \"70\": 4926, \"68\": 6314, \"80\": 1266, \"65\": 9993, \"67\": 8864, \"78\": 1916, \"81\": 832, \"83\": 950, \"75\": 3214, \"82\": 605, \"79\": 1256, \"84\": 320, \"87\": 89, \"88\": 87, \"86\": 45, \"95\": 6, \"90\": 70, \"85\": 211, \"89\": 50, \"92\": 19, \"93\": 11, \"91\": 6, \"96\": 3, \"94\": 5, \"97\": 2, \"99\": 1, \"100\": 1}',\n 'index_word': '{\"1\": \" \", \"2\": \"e\", \"3\": \"\\\\\"\", \"4\": \"a\", \"5\": \"\\\\n\", \"6\": \"o\", \"7\": \"n\", \"8\": \"r\", \"9\": \"t\", \"10\": \"i\", \"11\": \"s\", \"12\": \"l\", \"13\": \"c\", \"14\": \"d\", \"15\": \".\", \"16\": \"p\", \"17\": \"u\", \"18\": \"\\\\u2022\", \"19\": \"h\", \"20\": \"g\", \"21\": \"m\", \"22\": \"\\\\u25aa\", \"23\": \"\\\\ufe0e\", \"24\": \"b\", \"25\": \"1\", \"26\": \"k\", \"27\": \"f\", \"28\": \"w\", \"29\": \"2\", \"30\": \"0\", \"31\": \"v\", \"32\": \"/\", \"33\": \"y\", \"34\": \"C\", \"35\": \"3\", \"36\": \"x\", \"37\": \"4\", \"38\": \"S\", \"39\": \"B\", \"40\": \"P\", \"41\": \"\\\\ud83d\\\\udcd5\", \"42\": \"\\\\ud83e\\\\udd69\", \"43\": \"\\\\u270d\", \"44\": \"\\\\ufe0f\", \"45\": \"z\", \"46\": \"(\", \"47\": \")\", \"48\": \"T\", \"49\": \"5\", \"50\": \"M\", \"51\": \"A\", \"52\": \"-\", \"53\": \"\\\\\\\\\", \"54\": \"j\", \"55\": \";\", \"56\": \"8\", \"57\": \"R\", \"58\": \"D\", \"59\": \"6\", \"60\": \"W\", \"61\": \"F\", \"62\": \"q\", \"63\": \"I\", \"64\": \"L\", \"65\": \"H\", \"66\": \"9\", \"67\": \"G\", \"68\": \"O\", \"69\": \"7\", \"70\": \"E\", \"71\": \"J\", \"72\": \"K\", \"73\": \"N\", \"74\": \"\\'\", \"75\": \"V\", \"76\": \"Y\", \"77\": \"U\", \"78\": \":\", \"79\": \"!\", \"80\": \"Q\", \"81\": \",\", \"82\": \"*\", \"83\": \"Z\", \"84\": \"&\", \"85\": \"%\", \"86\": \"?\", \"87\": \"+\", \"88\": \"X\", \"89\": \"~\", \"90\": \"#\", \"91\": \"_\", \"92\": \"`\", \"93\": \"=\", \"94\": \"$\", \"95\": \"]\", \"96\": \"|\", \"97\": \"\\\\u00e9\", \"98\": \"\\\\u2423\", \"99\": \"@\", \"100\": \">\"}',\n 'word_index': '{\" \": 1, \"e\": 2, \"\\\\\"\": 3, \"a\": 4, \"\\\\n\": 5, \"o\": 6, \"n\": 7, \"r\": 8, \"t\": 9, \"i\": 10, \"s\": 11, \"l\": 12, \"c\": 13, \"d\": 14, \".\": 15, \"p\": 16, \"u\": 17, \"\\\\u2022\": 18, \"h\": 19, \"g\": 20, \"m\": 21, \"\\\\u25aa\": 22, \"\\\\ufe0e\": 23, \"b\": 24, \"1\": 25, \"k\": 26, \"f\": 27, \"w\": 28, \"2\": 29, \"0\": 30, \"v\": 31, \"/\": 32, \"y\": 33, \"C\": 34, \"3\": 35, \"x\": 36, \"4\": 37, \"S\": 38, \"B\": 39, \"P\": 40, \"\\\\ud83d\\\\udcd5\": 41, \"\\\\ud83e\\\\udd69\": 42, \"\\\\u270d\": 43, \"\\\\ufe0f\": 44, \"z\": 45, \"(\": 46, \")\": 47, \"T\": 48, \"5\": 49, \"M\": 50, \"A\": 51, \"-\": 52, \"\\\\\\\\\": 53, \"j\": 54, \";\": 55, \"8\": 56, \"R\": 57, \"D\": 58, \"6\": 59, \"W\": 60, \"F\": 61, \"q\": 62, \"I\": 63, \"L\": 64, \"H\": 65, \"9\": 66, \"G\": 67, \"O\": 68, \"7\": 69, \"E\": 70, \"J\": 71, \"K\": 72, \"N\": 73, \"\\'\": 74, \"V\": 75, \"Y\": 76, \"U\": 77, \":\": 78, \"!\": 79, \"Q\": 80, \",\": 81, \"*\": 82, \"Z\": 83, \"&\": 84, \"%\": 85, \"?\": 86, \"+\": 87, \"X\": 88, \"~\": 89, \"#\": 90, \"_\": 91, \"`\": 92, \"=\": 93, \"$\": 94, \"]\": 95, \"|\": 96, \"\\\\u00e9\": 97, \"\\\\u2423\": 98, \"@\": 99, \">\": 100}'}"},"metadata":{}}]},{"cell_type":"markdown","source":"\nThe `VOCABULARY_SIZE` variable is used to specify the number of unique tokens that the tokenizer can recognize.Tthe vocabulary size is calculated by taking the length of the `tokenizer.word_counts` dictionary, which contains the frequency of each word in the training data, and adding 1..\nThe `print` statement is used to display the value of the `VOCABULARY_SIZE` varia.le.\n","metadata":{"id":"dc3bfb5b-570a-4c62-bee7-d1ecc71b23af"}},{"cell_type":"code","source":"# Adding +1 to take into account a special unassigned 0 index.\nVOCABULARY_SIZE = len(tokenizer.word_counts) + 1\n\nprint('VOCABULARY_SIZE: ', VOCABULARY_SIZE)","metadata":{"id":"1fd50226-2b26-472b-a529-0fe57abf2139","outputId":"afcd9a6f-bf4e-4021-b586-12a913b9c51c","execution":{"iopub.status.busy":"2024-04-27T15:55:21.446773Z","iopub.execute_input":"2024-04-27T15:55:21.447072Z","iopub.status.idle":"2024-04-27T15:55:21.457531Z","shell.execute_reply.started":"2024-04-27T15:55:21.447039Z","shell.execute_reply":"2024-04-27T15:55:21.456705Z"},"trusted":true},"execution_count":22,"outputs":[{"name":"stdout","text":"VOCABULARY_SIZE: 101\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This code is printing the word that corresponds to the index 21 in the tokenizer's vocabulary","metadata":{"id":"3582caab-849b-4cdf-b961-21f96ef56fa0"}},{"cell_type":"code","source":"print(tokenizer.index_word[21])","metadata":{"id":"d86125d9-d3b2-4f33-9bc0-66d5ef851e03","outputId":"2c5dea4e-ce4f-4497-afde-56361d0e7d6a","execution":{"iopub.status.busy":"2024-04-27T15:55:21.459634Z","iopub.execute_input":"2024-04-27T15:55:21.460004Z","iopub.status.idle":"2024-04-27T15:55:21.468894Z","shell.execute_reply.started":"2024-04-27T15:55:21.459971Z","shell.execute_reply":"2024-04-27T15:55:21.468132Z"},"trusted":true},"execution_count":23,"outputs":[{"name":"stdout","text":"m\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This code is printing the index of the word m in the tokenizer's vocabulary","metadata":{"id":"da001dde-0a75-4d7d-805e-01bc0704284d"}},{"cell_type":"code","source":"tokenizer.word_index['m']","metadata":{"id":"ddbd0283-e0ba-42f0-89dc-59f7719fd871","outputId":"2705de20-a257-4eb1-a564-efb3ec33bb75","execution":{"iopub.status.busy":"2024-04-27T15:55:21.470042Z","iopub.execute_input":"2024-04-27T15:55:21.470388Z","iopub.status.idle":"2024-04-27T15:55:21.480562Z","shell.execute_reply.started":"2024-04-27T15:55:21.470356Z","shell.execute_reply":"2024-04-27T15:55:21.479685Z"},"trusted":true},"execution_count":24,"outputs":[{"execution_count":24,"output_type":"execute_result","data":{"text/plain":"21"},"metadata":{}}]},{"cell_type":"markdown","source":"The `tokenizer.sequences_to_texts` method converts a list of token sequences into a list of text strings. In this case, the list of token sequences is created by iterating over the range of indices in the tokenizer's vocabulary and creating a list of lists, each containing a single token index.\nThe `print` statement is used to display the list of characters that correspond to the list of token indices.\nThe code creates an array of characters that represent the vocabulary of the tokenizer.","metadata":{"id":"9935ac2a-2790-4549-8c76-6d2998be596e"}},{"cell_type":"code","source":"# For demo application we need to have an array of characters as vocabulary.\narray_vocabulary = tokenizer.sequences_to_texts([[word_index] for word_index in range(VOCABULARY_SIZE)])\nprint([char for char in array_vocabulary])","metadata":{"id":"85a6b042-68aa-4733-9b83-c69080f15853","outputId":"9767e1e1-c5dc-4259-bb8e-ce2b9e47c776","execution":{"iopub.status.busy":"2024-04-27T15:55:21.481529Z","iopub.execute_input":"2024-04-27T15:55:21.481786Z","iopub.status.idle":"2024-04-27T15:55:21.491113Z","shell.execute_reply.started":"2024-04-27T15:55:21.481763Z","shell.execute_reply":"2024-04-27T15:55:21.490325Z"},"trusted":true},"execution_count":25,"outputs":[{"name":"stdout","text":"['', ' ', 'e', '\"', 'a', '\\n', 'o', 'n', 'r', 't', 'i', 's', 'l', 'c', 'd', '.', 'p', 'u', '•', 'h', 'g', 'm', '▪', '︎', 'b', '1', 'k', 'f', 'w', '2', '0', 'v', '/', 'y', 'C', '3', 'x', '4', 'S', 'B', 'P', '📕', '🥩', '✍', '️', 'z', '(', ')', 'T', '5', 'M', 'A', '-', '\\\\', 'j', ';', '8', 'R', 'D', '6', 'W', 'F', 'q', 'I', 'L', 'H', '9', 'G', 'O', '7', 'E', 'J', 'K', 'N', \"'\", 'V', 'Y', 'U', ':', '!', 'Q', ',', '*', 'Z', '&', '%', '?', '+', 'X', '~', '#', '_', '`', '=', '$', ']', '|', 'é', '␣', '@', '>']\n","output_type":"stream"}]},{"cell_type":"markdown","source":"The `tokenizer.texts_to_sequences` method converts a list of text strings into a list of token sequences.\nThe tokenizer will break down the text string into a sequence of tokens. The specific tokens that are produced will depend on the tokenizer's vocabulary and the rules that it uses to tokenize text.","metadata":{"id":"fad22343-31d7-4b02-823a-de2994ebb516"}},{"cell_type":"code","source":"tokenizer.texts_to_sequences(['🥩 meat'])","metadata":{"id":"f62fdada-f44d-45be-ab3a-d4fd673fca7d","outputId":"02f8c3ea-457e-4682-9266-0699a368cb61","execution":{"iopub.status.busy":"2024-04-27T15:55:21.492688Z","iopub.execute_input":"2024-04-27T15:55:21.493122Z","iopub.status.idle":"2024-04-27T15:55:21.503005Z","shell.execute_reply.started":"2024-04-27T15:55:21.493090Z","shell.execute_reply":"2024-04-27T15:55:21.502097Z"},"trusted":true},"execution_count":26,"outputs":[{"execution_count":26,"output_type":"execute_result","data":{"text/plain":"[[42, 1, 21, 2, 4, 9]]"},"metadata":{}}]},{"cell_type":"markdown","source":"## Vectorization","metadata":{"id":"dbcc0122-540f-484a-91c8-fff8ebf0c7be"}},{"cell_type":"markdown","source":"\nThe texts_to_sequences method of the tokenizer takes a list of text strings as input and returns a list of lists of integers, where each inner list represents the sequence of token IDs for the corresponding text strin The method of vectorization used in this function is word-level tokenization, where each word in the text string is converted into a unique integer ID.g.\nThe resulting dataset_vectorized is a list of lists of integers, where each inner list represents a sequence of token IDs for a text string in the original dataset.\nThe print statement simply outputs the size of the vectorized dataset, which is the number of text strings in the original dataset.","metadata":{"id":"e8da5722-4b3a-407d-86de-7cdaaa5bf92a"}},{"cell_type":"code","source":"dataset_vectorized = tokenizer.texts_to_sequences(dataset_filtered)\n\nprint('Vectorized dataset size', len(dataset_vectorized))","metadata":{"id":"8d79c158-a292-462a-b0f8-a42f1980f9c8","outputId":"9f939127-ae33-442c-ce72-0d0c175530b8","execution":{"iopub.status.busy":"2024-04-27T15:55:21.504180Z","iopub.execute_input":"2024-04-27T15:55:21.504532Z","iopub.status.idle":"2024-04-27T15:55:30.355707Z","shell.execute_reply.started":"2024-04-27T15:55:21.504501Z","shell.execute_reply":"2024-04-27T15:55:30.354797Z"},"trusted":true},"execution_count":27,"outputs":[{"name":"stdout","text":"Vectorized dataset size 63766\n","output_type":"stream"}]},{"cell_type":"markdown","source":"The function recipe_sequence_to_string takes a recipe sequence as input and converts it back into a text string.\nThe function first uses the tokenizer to convert the recipe sequence back into a list of text strings.\nThen, it uses a regular expression to remove any HTML tags from the text string and replaces any triple spaces with a single space.\nFinally, it prints the resulting text string.\nThe code then calls the recipe_sequence_to_string function with the first recipe sequence in the dataset_vectorized list as input.","metadata":{"id":"597e9ba2-a272-4e44-9f32-24b43a659d34"}},{"cell_type":"code","source":"def recipe_sequence_to_string(recipe_sequence):\n recipe_stringified = tokenizer.sequences_to_texts([recipe_sequence])[0] ## msh fahma awi leh 7atena 0\n recipe_stringified = re.sub(r'(?<=\\S)\\s(?=\\S)', '', recipe_stringified).replace(\" \", \" \")\n print(recipe_stringified)","metadata":{"id":"c581f957-335f-440c-8b99-da5f41c78691","execution":{"iopub.status.busy":"2024-04-27T15:55:30.364028Z","iopub.execute_input":"2024-04-27T15:55:30.364313Z","iopub.status.idle":"2024-04-27T15:55:30.369630Z","shell.execute_reply.started":"2024-04-27T15:55:30.364290Z","shell.execute_reply":"2024-04-27T15:55:30.368692Z"},"trusted":true},"execution_count":28,"outputs":[]},{"cell_type":"markdown","source":"This will convert the 100th recipe sequence back into a text string and print it.","metadata":{"id":"683ef3b6-7055-4849-9552-9cbbee56aa87"}},{"cell_type":"code","source":"recipe_sequence_to_string(dataset_vectorized[99])","metadata":{"id":"f8be5f8e-5ec1-42b5-b2db-a8f6b9a9dcc9","outputId":"debb1873-34a1-456e-c63c-f4252f4a38bc","execution":{"iopub.status.busy":"2024-04-27T15:55:30.370673Z","iopub.execute_input":"2024-04-27T15:55:30.371285Z","iopub.status.idle":"2024-04-27T15:55:30.382074Z","shell.execute_reply.started":"2024-04-27T15:55:30.371258Z","shell.execute_reply":"2024-04-27T15:55:30.381152Z"},"trusted":true},"execution_count":29,"outputs":[{"name":"stdout","text":"📕 Crisp Oatmeal Cookies \n \n 🥩 \n \n • \"4 c. quick cooking oats\" \n • \"2 c. brown sugar \n • packed\" \n • \"1 c. salad oil\" \n • \"2 eggs \n • well beaten\" \n • \"1/2 tsp. salt\" \n • \"1 tsp. almond extract\" \n \n ✍️ \n \n ▪︎ \"Mix oats \n ▪︎ brown sugar and oil; let stand overnight or 8 hours. Preheat oven to 325\\u00b0.\" \n ▪︎ \"Mix rest of ingredients into oat mixture. Drop by teaspoon onto greased baking sheet.\" \n ▪︎ \"Bake 15 minutes. Cool completely before removing from baking sheet.\" \n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Add padding to sequences","metadata":{"id":"f61a079f-2a3e-497f-bc83-c935d806f56e"}},{"cell_type":"markdown","source":"This code iterates over the first 20 elements of the list `dataset_vectorized` and print the length of it","metadata":{"id":"1837316c-cbdc-4395-a703-fa9229146f3f"}},{"cell_type":"code","source":"for recipe_index, recipe in enumerate(dataset_vectorized[:20]):\n print('Recipe #{} length: {}'.format(recipe_index + 1, len(recipe)))","metadata":{"id":"74fe03c5-747c-435e-967e-fa57cc971969","outputId":"0168f309-a483-4155-dc25-cfee081f49c3","execution":{"iopub.status.busy":"2024-04-27T15:55:30.383104Z","iopub.execute_input":"2024-04-27T15:55:30.383387Z","iopub.status.idle":"2024-04-27T15:55:30.392329Z","shell.execute_reply.started":"2024-04-27T15:55:30.383363Z","shell.execute_reply":"2024-04-27T15:55:30.391418Z"},"trusted":true},"execution_count":30,"outputs":[{"name":"stdout","text":"Recipe #1 length: 603\nRecipe #2 length: 341\nRecipe #3 length: 361\nRecipe #4 length: 604\nRecipe #5 length: 415\nRecipe #6 length: 484\nRecipe #7 length: 424\nRecipe #8 length: 588\nRecipe #9 length: 600\nRecipe #10 length: 639\nRecipe #11 length: 450\nRecipe #12 length: 304\nRecipe #13 length: 532\nRecipe #14 length: 380\nRecipe #15 length: 548\nRecipe #16 length: 264\nRecipe #17 length: 619\nRecipe #18 length: 500\nRecipe #19 length: 534\nRecipe #20 length: 399\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This code prints the variable `MAX_RECIPE_LENGTH` which assigns the maximum length of the recipes","metadata":{"id":"78618d7f-3e6b-4be8-add9-7dd8ee519d34"}},{"cell_type":"code","source":"MAX_RECIPE_LENGTH","metadata":{"id":"24af5bba-f8ae-4c66-8d18-6c945351e902","outputId":"e76555c5-65f7-4a37-f441-cd3869f750b2","execution":{"iopub.status.busy":"2024-04-27T15:55:30.393447Z","iopub.execute_input":"2024-04-27T15:55:30.393720Z","iopub.status.idle":"2024-04-27T15:55:30.403761Z","shell.execute_reply.started":"2024-04-27T15:55:30.393696Z","shell.execute_reply":"2024-04-27T15:55:30.402901Z"},"trusted":true},"execution_count":31,"outputs":[{"execution_count":31,"output_type":"execute_result","data":{"text/plain":"750"},"metadata":{}}]},{"cell_type":"markdown","source":"This code uses the `pad_sequences` function from the `tf.keras.preprocessing.sequence` module to pad the sequences in the `dataset_vectorized` variable. Padding is a technique used to ensure that all sequences have the same length, which is necessary for certain NLP tasks.\n - The `padding='post'` argument specifies that the padding should be added at the end of the sequences.\n - The `truncating='post'` argument specifies that if a sequence is longer than the specified `maxlen`, it should be truncated at the end.\n - The `maxlen` argument specifies the maximum length of the padded sequences, which is set to `MAX_RECIPE_LENGTH-1`.\n - The `value` argument specifies the value to use for padding. In this case, it is set to the numerical representation of the `STOP_SIGN` token, which is obtained by converting the string `STOP_SIGN` into a numerical sequence using the `tokenizer.texts_to_","metadata":{"id":"1d23ad32-e34d-4c0e-943c-c1f8fd353e6d"}},{"cell_type":"code","source":"dataset_vectorized_padded_without_stops = tf.keras.preprocessing.sequence.pad_sequences(\n dataset_vectorized,\n padding='post',\n truncating='post',\n maxlen=MAX_RECIPE_LENGTH-1,\n value=tokenizer.texts_to_sequences([STOP_SIGN])[0] # 0 is the index of '␣'\n)","metadata":{"id":"8e133d61-298d-4219-85d0-3c12f3b0d5b8","execution":{"iopub.status.busy":"2024-04-27T15:55:30.404884Z","iopub.execute_input":"2024-04-27T15:55:30.405148Z","iopub.status.idle":"2024-04-27T15:55:32.783707Z","shell.execute_reply.started":"2024-04-27T15:55:30.405126Z","shell.execute_reply":"2024-04-27T15:55:32.782919Z"},"trusted":true},"execution_count":32,"outputs":[]},{"cell_type":"markdown","source":"We repeat the function using -1 above and +1 below to make sure that all recipes will have at least 1 stop sign at the end,","metadata":{"id":"8d190ffb-d144-4a78-b579-751191fd3aef"}},{"cell_type":"code","source":"dataset_vectorized_padded = tf.keras.preprocessing.sequence.pad_sequences(\n dataset_vectorized_padded_without_stops,\n padding='post',\n truncating='post',\n maxlen=MAX_RECIPE_LENGTH+1,\n value=tokenizer.texts_to_sequences([STOP_SIGN])[0]\n)","metadata":{"id":"9391ed7a-7188-45e4-8433-6f74b762f66b","execution":{"iopub.status.busy":"2024-04-27T15:55:32.784749Z","iopub.execute_input":"2024-04-27T15:55:32.785016Z","iopub.status.idle":"2024-04-27T15:55:33.064118Z","shell.execute_reply.started":"2024-04-27T15:55:32.784993Z","shell.execute_reply":"2024-04-27T15:55:33.063326Z"},"trusted":true},"execution_count":33,"outputs":[]},{"cell_type":"markdown","source":"This code iterates over the first 20 elements of the list `dataset_vectorized_padded` and print the length of it","metadata":{"id":"99b5cc36-16a8-4df5-9a77-c7cd02c8ea27"}},{"cell_type":"code","source":"for recipe_index, recipe in enumerate(dataset_vectorized_padded[:20]):\n print('Recipe #{} length: {}'.format(recipe_index, len(recipe)))","metadata":{"id":"c4c01fa8-3a82-4cb8-923a-f964128284ca","outputId":"951ec5d1-b5fa-458e-9e08-5d725bb9ceb0","execution":{"iopub.status.busy":"2024-04-27T15:55:33.065292Z","iopub.execute_input":"2024-04-27T15:55:33.065641Z","iopub.status.idle":"2024-04-27T15:55:33.071662Z","shell.execute_reply.started":"2024-04-27T15:55:33.065610Z","shell.execute_reply":"2024-04-27T15:55:33.070805Z"},"trusted":true},"execution_count":34,"outputs":[{"name":"stdout","text":"Recipe #0 length: 751\nRecipe #1 length: 751\nRecipe #2 length: 751\nRecipe #3 length: 751\nRecipe #4 length: 751\nRecipe #5 length: 751\nRecipe #6 length: 751\nRecipe #7 length: 751\nRecipe #8 length: 751\nRecipe #9 length: 751\nRecipe #10 length: 751\nRecipe #11 length: 751\nRecipe #12 length: 751\nRecipe #13 length: 751\nRecipe #14 length: 751\nRecipe #15 length: 751\nRecipe #16 length: 751\nRecipe #17 length: 751\nRecipe #18 length: 751\nRecipe #19 length: 751\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Create TensorFlow dataset","metadata":{"id":"ZoZqCOjRzoBH"}},{"cell_type":"markdown","source":"The function `tf.data.Dataset.from_tensor_slices()` takes a NumPy array as input and creates a dataset that exports each element of the array as a separate element.\nIn this case, the `dataset_vectorized_padded` variable is a NumPy array that contains the padded vectors of the input text sequences.","metadata":{"id":"IvxncxYPrzml"}},{"cell_type":"code","source":"dataset = tf.data.Dataset.from_tensor_slices(dataset_vectorized_padded)","metadata":{"id":"fpDdBfO_mD9X","execution":{"iopub.status.busy":"2024-04-27T15:55:33.072793Z","iopub.execute_input":"2024-04-27T15:55:33.073102Z","iopub.status.idle":"2024-04-27T15:55:34.164643Z","shell.execute_reply.started":"2024-04-27T15:55:33.073077Z","shell.execute_reply":"2024-04-27T15:55:34.163774Z"},"trusted":true},"execution_count":35,"outputs":[]},{"cell_type":"markdown","source":"This line displays information about the dataset, such as the number of elements, the data types of the elements.","metadata":{"id":"ErxoBVX11HDb"}},{"cell_type":"code","source":"print(dataset)","metadata":{"id":"DrRaSXTsmC6J","outputId":"968fa48e-c17e-462b-82f4-f72dad4f1de7","execution":{"iopub.status.busy":"2024-04-27T15:55:34.165980Z","iopub.execute_input":"2024-04-27T15:55:34.166376Z","iopub.status.idle":"2024-04-27T15:55:34.172290Z","shell.execute_reply.started":"2024-04-27T15:55:34.166340Z","shell.execute_reply":"2024-04-27T15:55:34.171277Z"},"trusted":true},"execution_count":36,"outputs":[{"name":"stdout","text":"<_TensorSliceDataset element_spec=TensorSpec(shape=(751,), dtype=tf.int32, name=None)>\n","output_type":"stream"}]},{"cell_type":"markdown","source":"1. `dataset.take(1)`: This line of code takes the first element from the `dataset` object. It assumes that `dataset` is a collection of recipes, and it selects the 101 recipe from that collection.\n2. `print('Recipe in tensorflow:\\n', dataset.take(1), '\\n\\n\\n')`: This line prints the first recipe data from tensorflow dataset\n3. `print('Raw recipe:\\n', recipe.numpy(), '\\n\\n\\n')`: This line prints the raw recipe data in a human-readable format. The `recipe` variable is likely a NumPy array that contains the recipe data, and the `numpy()` method converts it into a Python list.\n4. `recipe_sequence_to_string(recipe.numpy())`: This line calls a function named `recipe_sequence_to_string` and passes the NumPy array containing the recipe data as an argument. The `recipe_sequence_to_string` function is likely responsible for converting the recipe data into a human-readable string format.\nIn summary, this code snippet appears to be part of a program that processes recipes. It takes the first recipe from a dataset, prints the raw recipe data, and then prints a stringified version of the recipe. The `recipe_sequence_to_string` function is responsible for converting the recipe data into a human-readable string format.","metadata":{"id":"ZMQrrZ-2D3yt"}},{"cell_type":"code","source":"print('Recipe in tensorflow:\\n', dataset.take(1), '\\n\\n\\n')\nfor recipe in dataset.take(1):\n print('Raw recipe:\\n', recipe.numpy(), '\\n\\n\\n')\n print('Stringified recipe:\\n')\n recipe_sequence_to_string(recipe.numpy())\n","metadata":{"id":"724530bd-3768-461d-9742-c282e10a6c89","outputId":"45bdcb89-0140-4e83-9885-23c424a671e8","execution":{"iopub.status.busy":"2024-04-27T15:55:34.173601Z","iopub.execute_input":"2024-04-27T15:55:34.173929Z","iopub.status.idle":"2024-04-27T15:55:34.374003Z","shell.execute_reply.started":"2024-04-27T15:55:34.173880Z","shell.execute_reply":"2024-04-27T15:55:34.373031Z"},"trusted":true},"execution_count":37,"outputs":[{"name":"stdout","text":"Recipe in tensorflow:\n <_TakeDataset element_spec=TensorSpec(shape=(751,), dtype=tf.int32, name=None)> \n\n\n\nRaw recipe:\n [41 1 73 6 52 39 4 26 2 1 73 17 9 1 34 6 6 26 10 2 11 5 5 42\n 5 5 18 1 3 25 1 13 15 1 27 10 8 21 12 33 1 16 4 13 26 2 14 1\n 24 8 6 28 7 1 11 17 20 4 8 3 5 18 1 3 25 32 29 1 13 15 1 2\n 31 4 16 6 8 4 9 2 14 1 21 10 12 26 3 5 18 1 3 25 32 29 1 9\n 11 16 15 1 31 4 7 10 12 12 4 3 5 18 1 3 25 32 29 1 13 15 1 24\n 8 6 26 2 7 1 7 17 9 11 1 46 16 2 13 4 7 11 47 3 5 18 1 3\n 29 1 48 24 11 16 15 1 24 17 9 9 2 8 1 6 8 1 21 4 8 20 4 8\n 10 7 2 3 5 18 1 3 35 1 25 32 29 1 13 15 1 24 10 9 2 1 11 10\n 45 2 1 11 19 8 2 14 14 2 14 1 8 10 13 2 1 24 10 11 13 17 10 9\n 11 3 5 5 43 44 5 5 22 23 1 3 63 7 1 4 1 19 2 4 31 33 1 29\n 52 62 17 4 8 9 1 11 4 17 13 2 16 4 7 5 22 23 1 21 10 36 1 24\n 8 6 28 7 1 11 17 20 4 8 5 22 23 1 7 17 9 11 5 22 23 1 2 31\n 4 16 6 8 4 9 2 14 1 21 10 12 26 1 4 7 14 1 24 17 9 9 2 8\n 1 6 8 1 21 4 8 20 4 8 10 7 2 15 3 5 22 23 1 3 38 9 10 8\n 1 6 31 2 8 1 21 2 14 10 17 21 1 19 2 4 9 1 17 7 9 10 12 1\n 21 10 36 9 17 8 2 1 24 17 24 24 12 2 11 1 4 12 12 1 6 31 2 8\n 1 9 6 16 15 3 5 22 23 1 3 39 6 10 12 1 4 7 14 1 11 9 10 8\n 1 49 1 21 10 7 17 9 2 11 1 21 6 8 2 15 1 48 4 26 2 1 6 27\n 27 1 19 2 4 9 15 3 5 22 23 1 3 38 9 10 8 1 10 7 1 31 4 7\n 10 12 12 4 1 4 7 14 1 13 2 8 2 4 12 55 1 21 10 36 1 28 2 12\n 12 15 3 5 22 23 1 3 77 11 10 7 20 1 29 1 9 2 4 11 16 6 6 7\n 11 5 22 23 1 14 8 6 16 1 4 7 14 1 11 19 4 16 2 1 10 7 9 6\n 1 35 30 1 13 12 17 11 9 2 8 11 1 6 7 1 28 4 36 1 16 4 16 2\n 8 15 3 5 22 23 1 3 64 2 9 1 11 9 4 7 14 1 17 7 9 10 12 1\n 27 10 8 21 5 22 23 1 4 24 6 17 9 1 35 30 1 21 10 7 17 9 2 11\n 15 3 5 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98 98\n 98 98 98 98 98 98 98] \n\n\n\nStringified recipe:\n\n📕 No-Bake Nut Cookies \n \n 🥩 \n \n • \"1 c. firmly packed brown sugar\" \n • \"1/2 c. evaporated milk\" \n • \"1/2 tsp. vanilla\" \n • \"1/2 c. broken nuts (pecans)\" \n • \"2 Tbsp. butter or margarine\" \n • \"3 1/2 c. bite size shredded rice biscuits\" \n \n ✍️ \n \n ▪︎ \"In a heavy 2-quart saucepan \n ▪︎ mix brown sugar \n ▪︎ nuts \n ▪︎ evaporated milk and butter or margarine.\" \n ▪︎ \"Stir over medium heat until mixture bubbles all over top.\" \n ▪︎ \"Boil and stir 5 minutes more. Take off heat.\" \n ▪︎ \"Stir in vanilla and cereal; mix well.\" \n ▪︎ \"Using 2 teaspoons \n ▪︎ drop and shape into 30 clusters on wax paper.\" \n ▪︎ \"Let stand until firm \n ▪︎ about 30 minutes.\" \n ␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n","output_type":"stream"}]},{"cell_type":"markdown","source":"The function `split_input_target` takes a string as input and splits it into two parts: the input text and the target text. The input text is the string without the last character, and the target text is the string with the last character removed.","metadata":{"id":"T43kNUBRUNK4"}},{"cell_type":"code","source":"def split_input_target(recipe):\n input_text = recipe[:-1]\n target_text = recipe[1:]\n\n return input_text, target_text","metadata":{"id":"IKVCLNETTuLp","execution":{"iopub.status.busy":"2024-04-27T15:55:34.375117Z","iopub.execute_input":"2024-04-27T15:55:34.375387Z","iopub.status.idle":"2024-04-27T15:55:34.380139Z","shell.execute_reply.started":"2024-04-27T15:55:34.375363Z","shell.execute_reply":"2024-04-27T15:55:34.379199Z"},"trusted":true},"execution_count":38,"outputs":[]},{"cell_type":"markdown","source":" This code is using the `map` method of a dataset object to apply a function to each element of the dataset. In this case, the function being applied is `split_input_target`, which is presumably a function that takes a single input and splits it into two parts, an input and a target.\nThe `map` method returns a new dataset object that contains the results of applying the function to each element of the original dataset. In this case, the new dataset object will contain pairs of inputs and targets.","metadata":{"id":"fJpJLuOGsiN-"}},{"cell_type":"code","source":"dataset_targeted = dataset.map(split_input_target)\n\nprint(dataset_targeted)","metadata":{"id":"tPzv04iuqQN3","outputId":"82e50429-d1e0-4b74-b769-0bd44b0bc8cd","execution":{"iopub.status.busy":"2024-04-27T15:55:34.381601Z","iopub.execute_input":"2024-04-27T15:55:34.382251Z","iopub.status.idle":"2024-04-27T15:55:34.446443Z","shell.execute_reply.started":"2024-04-27T15:55:34.382216Z","shell.execute_reply":"2024-04-27T15:55:34.445576Z"},"trusted":true},"execution_count":39,"outputs":[{"name":"stdout","text":"<_MapDataset element_spec=(TensorSpec(shape=(750,), dtype=tf.int32, name=None), TensorSpec(shape=(750,), dtype=tf.int32, name=None))>\n","output_type":"stream"}]},{"cell_type":"markdown","source":"The function `split_train_test_data` takes two arguments: `dataset` and `train_ratio`.\n\n`num_samples` is set to the length of the dataset.\n`num_train_samples` is calculated as the integer portion of num_samples multiplied by train_ratio.\n`data_train` is created by taking the first num_train_samples from the dataset.\n`data_test` is created by skipping the first num_train_samples from the dataset.\nBoth `data_train` and `data_test` are converted to numpy arrays by calling `as_numpy_iterator()` and then converted to lists.\nFinally, `data_train` and `data_test` are returned as a tuple.","metadata":{}},{"cell_type":"code","source":"def split_train_test_data(dataset, train_ratio):\n num_samples = len(dataset)\n num_train_samples = int(num_samples * train_ratio)\n data_train = dataset.take(num_train_samples)\n data_test = dataset.skip(num_train_samples)\n data_train = list(data_train.as_numpy_iterator())\n data_test = list(data_test.as_numpy_iterator())\n return data_train, data_test","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:55:34.447463Z","iopub.execute_input":"2024-04-27T15:55:34.447757Z","iopub.status.idle":"2024-04-27T15:55:34.453193Z","shell.execute_reply.started":"2024-04-27T15:55:34.447730Z","shell.execute_reply":"2024-04-27T15:55:34.452279Z"},"trusted":true},"execution_count":40,"outputs":[]},{"cell_type":"markdown","source":"The first line `data_text_list = list(dataset_targeted)` converts the `dataset_targeted` object to a Python list as it is being iterated over. The result of the iteration is assigned to the variable `data_text_list`.\n\nThe second line `data_train, data_test = split_train_test_data(dataset_targeted, 0.7)` splits the `dataset_targeted` object into a training set and a testing set using the `split_train_test_da`ta function that was defined earlier. The train_ratio argument is set to 0.7, which means that 70% of the `dataset_targeted` object will be used for training, and the remaining 30% will be used for testing. The function returns two objects, `data_train` and `data_test`, which represent the training and testing sets, respectively.","metadata":{}},{"cell_type":"code","source":"data_text_list = list(dataset_targeted)\n\ndata_train, data_test = split_train_test_data(dataset_targeted, 0.7)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:55:34.454427Z","iopub.execute_input":"2024-04-27T15:55:34.454789Z","iopub.status.idle":"2024-04-27T15:56:10.476493Z","shell.execute_reply.started":"2024-04-27T15:55:34.454765Z","shell.execute_reply":"2024-04-27T15:56:10.475464Z"},"trusted":true},"execution_count":41,"outputs":[]},{"cell_type":"markdown","source":"`data_train_subset = data_train[:1]`: This line takes the first element of the `data_train `list. The `[:1]` slice notation means \"get the first element of the list\".\n\n`for input_example, target_example in data_train_subset:`: This line starts a loop that iterates over each tuple in the `data_train_subset` list. The loop variable `input_example` will hold the input sequence, and `target_example` will hold the target sequence.\n\n`print('Input sequence size:', repr(len(input_example)))`: This line prints the length of the input sequence. The `repr()` function is used to convert the length to a string.\n\n`print('Target sequence size:', repr(len(target_example)))`: This line prints the length of the target sequence. The `repr()` function is used to convert the length to a string.\n\n`print()`: This line prints an empty line to separate the output for each tuple.\n\n`input_stringified = tokenizer.sequences_to_texts([input_example[:50]])[0]`: This line converts the first 50 tokens of the input sequence to a string. The `sequences_to_texts()` function is used for this conversion. The `[0]` at the end is used to get the first (and only) element of the list returned by `sequences_to_texts()`.\n\n`target_stringified = tokenizer.sequences_to_texts([target_example[:50]])[0]`: This line converts the first 50 tokens of the target sequence to a string. The `sequences_to_texts()` function is used for this conversion. The `[0]` at the end is used to get the first (and only) element of the list returned by `sequences_to_texts()`.","metadata":{}},{"cell_type":"code","source":"data_train_subset = data_train[:1] # Take the first element from the list\n\nfor input_example, target_example in data_train_subset:\n print('Input sequence size:', repr(len(input_example)))\n print('Target sequence size:', repr(len(target_example)))\n print()\n\n input_stringified = tokenizer.sequences_to_texts([input_example[:50]])[0]\n target_stringified = tokenizer.sequences_to_texts([target_example[:50]])[0]\n\n print('Input: ', repr(''.join(input_stringified)))\n print('Target: ', repr(''.join(target_stringified)))","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:56:10.478182Z","iopub.execute_input":"2024-04-27T15:56:10.478478Z","iopub.status.idle":"2024-04-27T15:56:10.485811Z","shell.execute_reply.started":"2024-04-27T15:56:10.478454Z","shell.execute_reply":"2024-04-27T15:56:10.484876Z"},"trusted":true},"execution_count":42,"outputs":[{"name":"stdout","text":"Input sequence size: 750\nTarget sequence size: 750\n\nInput: '📕 N o - B a k e N u t C o o k i e s \\n \\n 🥩 \\n \\n • \" 1 c . f i r m l y p a c k e d b r'\nTarget: ' N o - B a k e N u t C o o k i e s \\n \\n 🥩 \\n \\n • \" 1 c . f i r m l y p a c k e d b r o'\n","output_type":"stream"}]},{"cell_type":"markdown","source":"`len(data_test)` returns the number of elements in the `data_test` list.","metadata":{}},{"cell_type":"code","source":"len( data_test)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:56:10.486976Z","iopub.execute_input":"2024-04-27T15:56:10.487275Z","iopub.status.idle":"2024-04-27T15:56:10.499520Z","shell.execute_reply.started":"2024-04-27T15:56:10.487251Z","shell.execute_reply":"2024-04-27T15:56:10.498662Z"},"trusted":true},"execution_count":43,"outputs":[{"execution_count":43,"output_type":"execute_result","data":{"text/plain":"19130"},"metadata":{}}]},{"cell_type":"markdown","source":"`len(data_train)` returns the number of elements in the `data_train` list.","metadata":{}},{"cell_type":"code","source":"len( data_train)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:56:10.500593Z","iopub.execute_input":"2024-04-27T15:56:10.500869Z","iopub.status.idle":"2024-04-27T15:56:10.509951Z","shell.execute_reply.started":"2024-04-27T15:56:10.500845Z","shell.execute_reply":"2024-04-27T15:56:10.508965Z"},"trusted":true},"execution_count":44,"outputs":[{"execution_count":44,"output_type":"execute_result","data":{"text/plain":"44636"},"metadata":{}}]},{"cell_type":"markdown","source":"This code is used to print out the first element of the `data_train list`, which is a subset of the training data. The first line `data_train_subset = data_train[:1]` creates a new list `data_train_subset` that contains only the first element of `data_train`.\n\nThe for loop then iterates over this subset, which contains a single element that is a tuple of input and target sequences. The `input_example` and `target_example` variables are used to refer to the input and target sequences, respectively.\n\nThe code then prints out the size of the input and target sequences using the `len()` function. This is followed by converting the input and target sequences to strings using the tokenizer.sequences_to_texts() function. The `[:50]` slice is used to limit the length of the input and target sequences to the first 50 characters.\n\nFinally, the input and target sequences are printed out using the `repr()` function to display the strings in a readable format. The `join()` function is used to concatenate the individual characters in the input and target sequences into strings.","metadata":{}},{"cell_type":"code","source":"def transform_element(input_target):\n input_sequence, target_sequence = input_target[0], input_target[1]\n\n # Apply tf.squeeze() only if the shape is compatible\n if input_sequence.shape[-1] == 1:\n input_sequence = tf.squeeze(input_sequence, axis=-1)\n if target_sequence.shape[-1] == 1:\n target_sequence = tf.squeeze(target_sequence, axis=-1)\n\n return input_sequence, target_sequence","metadata":{"execution":{"iopub.status.busy":"2024-04-27T15:56:10.511310Z","iopub.execute_input":"2024-04-27T15:56:10.511602Z","iopub.status.idle":"2024-04-27T15:56:10.520441Z","shell.execute_reply.started":"2024-04-27T15:56:10.511578Z","shell.execute_reply":"2024-04-27T15:56:10.519641Z"},"trusted":true},"execution_count":45,"outputs":[]},{"cell_type":"markdown","source":"### Split up the dataset into batches","metadata":{"id":"FvQSGO7y5hHH"}},{"cell_type":"markdown","source":" This display information about the targeted dataset, such as the number of elements.","metadata":{"id":"G-NeJSZvHRwF"}},{"cell_type":"code","source":"print(dataset_targeted)","metadata":{"id":"-vAzNdYP2eXl","outputId":"b56d890e-c205-4f96-b8a9-05d5eade0cc3","execution":{"iopub.status.busy":"2024-04-27T15:56:10.521521Z","iopub.execute_input":"2024-04-27T15:56:10.521945Z","iopub.status.idle":"2024-04-27T15:56:10.531565Z","shell.execute_reply.started":"2024-04-27T15:56:10.521913Z","shell.execute_reply":"2024-04-27T15:56:10.530588Z"},"trusted":true},"execution_count":46,"outputs":[{"name":"stdout","text":"<_MapDataset element_spec=(TensorSpec(shape=(750,), dtype=tf.int32, name=None), TensorSpec(shape=(750,), dtype=tf.int32, name=None))>\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This code sets the batch size to 64 and the shuffle buffer size to 1000. The shuffle buffer size determines how many elements are shuffled at a time when creating batches from the dataset. In this case, it means that the dataset will be shuffled in chunks of 1000 elements.\nThe `shuffle()` method in TensorFlow creates a dataset that shuffles its elements randomly. The `batch()` method creates a dataset that batches its elements into fixed-size batches. The `drop_remainder=True` argument specifies that any remaining elements that don't fit into a complete batch should be dropped. The `repeat()` method creates a dataset that repeats itself indefinitely.","metadata":{"id":"TRfDMZDtEFFq"}},{"cell_type":"code","source":"# Batch size.\n\n\n\nBATCH_SIZE = 64\n\n# Buffer size to shuffle the dataset (TF data is designed to work\n# with possibly infinite sequences, so it doesn't attempt to shuffle\n# the entire sequence in memory. Instead, it maintains a buffer in\n# which it shuffles elements).\nSHUFFLE_BUFFER_SIZE = 1000\n\ndataset_train = dataset_targeted.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True).repeat()\n\nprint(dataset_train)","metadata":{"id":"JG06vo8o5pXg","outputId":"3616f116-1bbf-4081-9e7f-29c4cf03135d","execution":{"iopub.status.busy":"2024-04-27T15:56:10.532811Z","iopub.execute_input":"2024-04-27T15:56:10.533318Z","iopub.status.idle":"2024-04-27T15:56:10.550842Z","shell.execute_reply.started":"2024-04-27T15:56:10.533292Z","shell.execute_reply":"2024-04-27T15:56:10.549971Z"},"trusted":true},"execution_count":47,"outputs":[{"name":"stdout","text":"<_RepeatDataset element_spec=(TensorSpec(shape=(64, 750), dtype=tf.int32, name=None), TensorSpec(shape=(64, 750), dtype=tf.int32, name=None))>\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This code uses the `take()` to retrie the first batch of data from the dataset_train.\nThe code then prints the input text and target text for the first batch of data. The input text is the text that is fed into the language model, and the target text is the text that the language model is expected to produce.","metadata":{"id":"2l6_3cRUF8fM"}},{"cell_type":"code","source":"for input_text, target_text in dataset_train.take(1):\n print('1st batch: input_text:', input_text,'\\n')\n print('1st batch: target_text:', target_text,'\\n')","metadata":{"id":"1BssYJXnD1s0","outputId":"4e890f6a-37ee-468f-e812-e7b479616076","execution":{"iopub.status.busy":"2024-04-27T15:56:10.551833Z","iopub.execute_input":"2024-04-27T15:56:10.552133Z","iopub.status.idle":"2024-04-27T15:56:10.744870Z","shell.execute_reply.started":"2024-04-27T15:56:10.552100Z","shell.execute_reply":"2024-04-27T15:56:10.743936Z"},"trusted":true},"execution_count":48,"outputs":[{"name":"stdout","text":"1st batch: input_text: tf.Tensor(\n[[41 1 57 ... 98 98 98]\n [41 1 57 ... 98 98 98]\n [41 1 40 ... 98 98 98]\n ...\n [41 1 39 ... 98 98 98]\n [41 1 39 ... 98 98 98]\n [41 1 61 ... 98 98 98]], shape=(64, 750), dtype=int32) \n\n1st batch: target_text: tf.Tensor(\n[[ 1 57 19 ... 98 98 98]\n [ 1 57 4 ... 98 98 98]\n [ 1 40 6 ... 98 98 98]\n ...\n [ 1 39 6 ... 98 98 98]\n [ 1 39 8 ... 98 98 98]\n [ 1 61 8 ... 98 98 98]], shape=(64, 750), dtype=int32) \n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"## Build the model","metadata":{"id":"qMFCFS7UXBkV"}},{"cell_type":"markdown","source":"This print the size of vocabulary","metadata":{"id":"6dOlyLKPW4HM"}},{"cell_type":"code","source":"print(VOCABULARY_SIZE)","metadata":{"id":"o3OSe2LqMXsn","outputId":"d109c9ee-ae19-4bee-85a7-96f92d61ab5c","execution":{"iopub.status.busy":"2024-04-27T15:56:10.746167Z","iopub.execute_input":"2024-04-27T15:56:10.746485Z","iopub.status.idle":"2024-04-27T15:56:10.751271Z","shell.execute_reply.started":"2024-04-27T15:56:10.746461Z","shell.execute_reply":"2024-04-27T15:56:10.750325Z"},"trusted":true},"execution_count":49,"outputs":[{"name":"stdout","text":"101\n","output_type":"stream"}]},{"cell_type":"markdown","source":"First line assign the length of `vocab_size` to the actual size of the dataset (`VOCABULARY_SIZE`)\n\nSecond Line assign the embedding dimension (`embedding_dim`) to 256\n\nThird line assign `rnn units` to 1024","metadata":{"id":"cYmpya_5XKUq"}},{"cell_type":"code","source":"vocab_size = VOCABULARY_SIZE\nembedding_dim = 256\nrnn_units = 1024","metadata":{"id":"Fet_96XbTjLd","execution":{"iopub.status.busy":"2024-04-27T15:56:10.752446Z","iopub.execute_input":"2024-04-27T15:56:10.752732Z","iopub.status.idle":"2024-04-27T15:56:10.761401Z","shell.execute_reply.started":"2024-04-27T15:56:10.752709Z","shell.execute_reply":"2024-04-27T15:56:10.760500Z"},"trusted":true},"execution_count":50,"outputs":[]},{"cell_type":"markdown","source":" This function called `build_model` that takes four arguments: `vocab_size`, `embedding_dim`, `rnn_units`, and `batch_size`. This function creates a sequential neural network model for natural language processing tasks.\nThe code creates a `tf.keras.models.Sequential` model, which is a linear stack of layers.\n\n\n\nThe first layer in the model is an `Embedding` layer, which converts categorical variables (in this case, word indices) into dense vectors.\nThe `input_dim` parameter specifies the size of the vocabulary (the number of unique words in the text data), and the `output_dim` parameter specifies the dimensionality of the dense vectors.\nThe `batch_input_shape` parameter specifies the shape of the input data, which is a batch of sequences of variable length.\nThe `output_dim` parameter of the `Embedding` layer specifies the dimensionality of the dense vectors that will be used to represent each word in the vocabulary.\nThis dimensionality is also referred to as the \"embedding size\" or \"embedding dimension\".\nThe embedding size determines the number of features that will be used to represent each word, and it has a significant impact on the performance of the model.\nA larger embedding size can capture more complex relationships between words, but it also increases the computational cost of the model.\nThe optimal embedding size depends on the specific task and dataset, and it is often determined through experimentation.\nIn general, a good rule of thumb is to start with an embedding size of 300 and adjust it as needed.\nThe `Embedding` layer will output a 3D tensor of shape `(batch_size, sequence_length, embedding_dim)`.\nThis means that the output of the `Embedding` layer will be a 3-dimensional tensor, where the first dimension corresponds to the batch size, the second dimension corresponds to the sequence length, and the third dimension corresponds to the embedding dimension.\n\n\n\nThe second layer in the model is an `LSTM` layer\nThe code creates an LSTM layer with `rnn_units` number of units.We want the output to be a sequence of vectors, so we set `return_sequences` to True.We want the LSTM layer to maintain its internal state, so we set `stateful` to True.\nThe `recurrent_initializer` argument specifies the initializer to use for the recurrent weights of the LSTM layer. In this case, we are using the Glorot Normal initializer, which is a commonly used initializer for recurrent neural networks.\n\n\nThe Third layer in the model is an `Dense` layer with `vocab_size` number of units. The dense layer is used to convert the output of the LSTM layer into a probability distribution over the vocabulary of the language model.\nThe number `vocab_size` refers to the number of neurons in the dense layer that is being added to the neural network model.\nIn a dense layer, each neuron in the previous layer is connected to each neuron in the current layer, and the weights of these connections are learned during the training process. The number of neurons in a dense layer determines the dimensionality of the output of that layer.\n`vocab_size` = 98, which means that the dense layer has 98 neurons, and the output of the previous layer will be flattened into a one-dimensional array of 98 values, and each of these values will be connected to each of the 98 neurons in the dense layer.\n\nThen the function `build_model` return model","metadata":{"id":"glN5Dt2ITuOg"}},{"cell_type":"code","source":"def build_model(vocab_size, embedding_dim, rnn_units, batch_size):\n model = tf.keras.models.Sequential()\n\n model.add(tf.keras.layers.Embedding(\n input_dim=vocab_size,\n output_dim=embedding_dim,\n batch_input_shape=[batch_size, None]\n ))\n\n model.add(tf.keras.layers.LSTM(\n units=rnn_units,\n return_sequences=True,\n stateful=True,\n recurrent_initializer=tf.keras.initializers.GlorotNormal()\n ))\n\n model.add(tf.keras.layers.Dense(vocab_size))\n\n return model","metadata":{"id":"UgE9V7eATmRA","execution":{"iopub.status.busy":"2024-04-27T15:56:10.762466Z","iopub.execute_input":"2024-04-27T15:56:10.762749Z","iopub.status.idle":"2024-04-27T15:56:10.771615Z","shell.execute_reply.started":"2024-04-27T15:56:10.762724Z","shell.execute_reply":"2024-04-27T15:56:10.770892Z"},"trusted":true},"execution_count":51,"outputs":[]},{"cell_type":"markdown","source":"First we build `model` using `build_model` function\n\n`model.summary()`method to obtain a summary of the model's architecture. This method prints the following information:\n - **Layer Name**: The name of each layer in the network.\n - **Input Shape**: The shape of the input data that the layer expects.\n - **Output Shape**: The shape of the output data that the layer produces.\n - **Parameters**: The number of trainable parameters in the layer.\n - **Total Params**: The total number of trainable parameters in the entire model.\n\n The output shape of (64, None, 256) indicates the following:\n- **Batch Size**: The first dimension of the output shape, 64, represents the batch size. This means that the model is processing 64 samples at a time.\n- **Sequence Length**: The second dimension of the output shape, None, represents the sequence length. This means that the model can process sequences of varying lengths. The actual sequence length will depend on the input data.\n- **Feature Dimension**: The third dimension of the output shape, 256, represents the feature dimension. This means that each sequence element is represented as a 256-dimensional vector.\n\nThe input shape for the \"sequential\" model is (64, None, 256), which is the shape of the output of the embedding layer.","metadata":{"id":"OA6jP4qqmPkc"}},{"cell_type":"code","source":"model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)\n\nmodel.summary()","metadata":{"id":"T1Y6QELJJolf","outputId":"1b17dc64-729e-4887-8a0b-dcf50debdf18","execution":{"iopub.status.busy":"2024-04-27T15:56:10.772701Z","iopub.execute_input":"2024-04-27T15:56:10.773021Z","iopub.status.idle":"2024-04-27T15:56:11.155079Z","shell.execute_reply.started":"2024-04-27T15:56:10.772983Z","shell.execute_reply":"2024-04-27T15:56:11.154139Z"},"trusted":true},"execution_count":52,"outputs":[{"name":"stdout","text":"Model: \"sequential\"\n_________________________________________________________________\n Layer (type) Output Shape Param # \n=================================================================\n embedding (Embedding) (64, None, 256) 25856 \n \n lstm (LSTM) (64, None, 1024) 5246976 \n \n dense (Dense) (64, None, 101) 103525 \n \n=================================================================\nTotal params: 5376357 (20.51 MB)\nTrainable params: 5376357 (20.51 MB)\nNon-trainable params: 0 (0.00 Byte)\n_________________________________________________________________\n","output_type":"stream"}]},{"cell_type":"markdown","source":" The `tf.keras.utils.plot_model()` function is used to visualize the architecture of a Keras model. It creates a graphical representation of the model, showing the different layers and their connections.\nThe `show_shapes` argument controls whether to display the shape of each layer. The `show_layer_names` argument controls whether to display the name of each layer. The `to_file` argument specifies the path to the file where the plot should be saved.","metadata":{"id":"W_HXguzzPsIz"}},{"cell_type":"code","source":"tf.keras.utils.plot_model(\n model,\n show_shapes=True,\n show_layer_names=True,\n to_file='model.png'\n)","metadata":{"id":"T2eIH2w4PGKM","outputId":"13b9bc0d-f750-4a48-ab7e-5b437f2adf2a","execution":{"iopub.status.busy":"2024-04-27T15:56:11.156302Z","iopub.execute_input":"2024-04-27T15:56:11.156580Z","iopub.status.idle":"2024-04-27T15:56:11.356199Z","shell.execute_reply.started":"2024-04-27T15:56:11.156546Z","shell.execute_reply":"2024-04-27T15:56:11.355201Z"},"trusted":true},"execution_count":53,"outputs":[{"execution_count":53,"output_type":"execute_result","data":{"image/png":"","text/plain":""},"metadata":{}}]},{"cell_type":"markdown","source":"## Trying The Model","metadata":{"id":"--FxwjIiSA9a"}},{"cell_type":"markdown","source":"dataset_train.take(1): This takes the first batch from the dataset_train dataset. The take() method returns a dataset that yields only the specified number of elements (in this case, 1) and then completes.\n\nfor input_example_batch, target_example_batch in dataset_train.take(1):: This iterates over the first batch of data in dataset_train. The batch contains two components: input_example_batch and target_example_batch. The input_example_batch contains the input data for the model, and target_example_batch contains the corresponding target values.\n\nexample_batch_predictions = model(input_example_batch): This line feeds the input_example_batch to the model and generates predictions. The model is a neural network or another machine learning model that you have defined and trained previously.\n\nSo, this code fetches the first batch of data from the dataset_train dataset and generates predictions for the input data using the defined model.","metadata":{"id":"myHJqxpzUAAk"}},{"cell_type":"code","source":"for input_example_batch, target_example_batch in dataset_train.take(1):\n example_batch_predictions = model(input_example_batch)\n print(example_batch_predictions.shape, \"# (batch_size, sequence_length, vocab_size)\")","metadata":{"id":"cutwkVoxSLMb","outputId":"0343f09d-4138-43a4-e170-3808c08c6943","execution":{"iopub.status.busy":"2024-04-27T15:56:11.357614Z","iopub.execute_input":"2024-04-27T15:56:11.358231Z","iopub.status.idle":"2024-04-27T15:56:12.035288Z","shell.execute_reply.started":"2024-04-27T15:56:11.358193Z","shell.execute_reply":"2024-04-27T15:56:12.034305Z"},"trusted":true},"execution_count":54,"outputs":[{"name":"stdout","text":"(64, 750, 101) # (batch_size, sequence_length, vocab_size)\n","output_type":"stream"}]},{"cell_type":"markdown","source":"1. logits=example_batch_predictions[0]: This line selects the 1st element (index 0) from the example_batch_predictions tensor. The example_batch_predictions tensor likely contains a batch of predictions made by the machine learning model for multiple examples. By selecting the 1st element, the code focuses on the predictions for a specific example.\n2. num_samples=1: This line specifies that we want to generate one sample from the categorical distribution. In other words, we want to select one predicted class for the given example.\n3. sampled_indices = tf.random.categorical(...): This line uses the tf.random.categorical function from TensorFlow to randomly sample one index from the categorical distribution defined by the logits. The logits represent the log probabilities of different classes, and the function will select the class with the highest probability. The result of this operation is stored in the sampled_indices tensor.\n4. sampled_indices.shape: This line prints the shape of the sampled_indices tensor. Since we specified num_samples=1, the shape of the tensor will be (1,), indicating that it contains a single index.\n","metadata":{"id":"AcddrE4je18a"}},{"cell_type":"code","source":"sampled_indices = tf.random.categorical(\n logits=example_batch_predictions[0],\n num_samples=1\n)\n\nsampled_indices.shape","metadata":{"id":"98agimVpV1Uu","outputId":"3474fdd3-bbc1-4a09-a149-00611ed16229","execution":{"iopub.status.busy":"2024-04-27T15:56:12.036654Z","iopub.execute_input":"2024-04-27T15:56:12.037002Z","iopub.status.idle":"2024-04-27T15:56:12.063695Z","shell.execute_reply.started":"2024-04-27T15:56:12.036974Z","shell.execute_reply":"2024-04-27T15:56:12.062762Z"},"trusted":true},"execution_count":55,"outputs":[{"execution_count":55,"output_type":"execute_result","data":{"text/plain":"TensorShape([750, 1])"},"metadata":{}}]},{"cell_type":"markdown","source":" This code prints the first 10 elements of a list `sampled_indices`","metadata":{"id":"1mndpGRjL5aY"}},{"cell_type":"code","source":"sampled_indices[:10]","metadata":{"id":"WGbwYEDfLJvQ","outputId":"09631417-9087-44c4-b8b3-cf741c69d828","execution":{"iopub.status.busy":"2024-04-27T15:56:12.064831Z","iopub.execute_input":"2024-04-27T15:56:12.065234Z","iopub.status.idle":"2024-04-27T15:56:12.076551Z","shell.execute_reply.started":"2024-04-27T15:56:12.065184Z","shell.execute_reply":"2024-04-27T15:56:12.075612Z"},"trusted":true},"execution_count":56,"outputs":[{"execution_count":56,"output_type":"execute_result","data":{"text/plain":""},"metadata":{}}]},{"cell_type":"markdown","source":"We use `tf.squeeze` to remove any dimensions of size 1 from the `sampled_indices` tensor, converts it into a NumPy array, and then prints the shape of the resulting array.\n\nThe purpose of this step is to make code simpler","metadata":{"id":"h37xkna1I_I2"}},{"cell_type":"code","source":"sampled_indices = tf.squeeze(sampled_indices).numpy()\nsampled_indices.shape","metadata":{"id":"0NxqPSJlIQbT","outputId":"1f3a6d42-e347-4b00-8d66-31531ac84224","execution":{"iopub.status.busy":"2024-04-27T15:56:12.077950Z","iopub.execute_input":"2024-04-27T15:56:12.078296Z","iopub.status.idle":"2024-04-27T15:56:12.086148Z","shell.execute_reply.started":"2024-04-27T15:56:12.078267Z","shell.execute_reply":"2024-04-27T15:56:12.085045Z"},"trusted":true},"execution_count":57,"outputs":[{"execution_count":57,"output_type":"execute_result","data":{"text/plain":"(750,)"},"metadata":{}}]},{"cell_type":"markdown","source":"This code prints the first 10 elements of a list `sampled_indices` after squeezed","metadata":{"id":"tkJtRCjKMT1v"}},{"cell_type":"code","source":"sampled_indices[:10]","metadata":{"id":"vHf5uSm7LAQ_","outputId":"8f2a0298-371e-4185-86fa-41b47da5989b","execution":{"iopub.status.busy":"2024-04-27T15:56:12.087812Z","iopub.execute_input":"2024-04-27T15:56:12.088154Z","iopub.status.idle":"2024-04-27T15:56:12.094946Z","shell.execute_reply.started":"2024-04-27T15:56:12.088115Z","shell.execute_reply":"2024-04-27T15:56:12.093744Z"},"trusted":true},"execution_count":58,"outputs":[{"execution_count":58,"output_type":"execute_result","data":{"text/plain":"array([94, 44, 39, 51, 57, 62, 22, 84, 23, 42])"},"metadata":{}}]},{"cell_type":"code","source":"print('Input:\\n', repr(''.join(tokenizer.sequences_to_texts([input_example_batch[0].numpy()[:50]]))))\nprint()\nprint('Next char prediction:\\n', repr(''.join(tokenizer.sequences_to_texts([sampled_indices[:50]]))))","metadata":{"id":"B5UQ3Di_HDpU","outputId":"4ae2db19-7333-4d4d-a108-905c2acbbebe","execution":{"iopub.status.busy":"2024-04-27T15:56:12.096424Z","iopub.execute_input":"2024-04-27T15:56:12.096783Z","iopub.status.idle":"2024-04-27T15:56:12.109077Z","shell.execute_reply.started":"2024-04-27T15:56:12.096752Z","shell.execute_reply":"2024-04-27T15:56:12.108047Z"},"trusted":true},"execution_count":59,"outputs":[{"name":"stdout","text":"Input:\n '📕 S t r a w b e r r y B o t t o m C h e e s e c a k e P i e \\n \\n 🥩 \\n \\n • \" 1 R e a d y -'\n\nNext char prediction:\n \"$ ️ B A R q ▪ & ︎ 🥩 ~ c R Q ␣ b ~ 2 \\\\ ' ▪ ( 1 S f - ! ️ z ️ Q Q | ✍ c Q D ✍ 1 K $ ▪ & w G 9 , l g l\"\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Trying the model with variable input","metadata":{"id":"c2nHDT7KLQ5n"}},{"cell_type":"code","source":"for input_example_batch_custom, target_example_batch_custom in dataset_train.take(1):\n random_input = np.zeros(shape=(BATCH_SIZE, 10))\n example_batch_predictions_custom = model(random_input)\n print('Prediction shape: ', example_batch_predictions_custom.shape, \"# (batch_size, sequence_length, vocab_size)\\n\")\n print('Custom length input: ')\n print(random_input)","metadata":{"id":"VQh1GAsZLTat","outputId":"449a48f4-63b4-43fc-918f-80e77de78633","execution":{"iopub.status.busy":"2024-04-27T15:56:12.110291Z","iopub.execute_input":"2024-04-27T15:56:12.110634Z","iopub.status.idle":"2024-04-27T15:56:12.316968Z","shell.execute_reply.started":"2024-04-27T15:56:12.110608Z","shell.execute_reply":"2024-04-27T15:56:12.316037Z"},"trusted":true},"execution_count":60,"outputs":[{"name":"stdout","text":"Prediction shape: (64, 10, 101) # (batch_size, sequence_length, vocab_size)\n\nCustom length input: \n[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]\n","output_type":"stream"}]},{"cell_type":"markdown","source":"## Training the model","metadata":{"id":"bV0iaMFSkGpi"}},{"cell_type":"markdown","source":"### Attach an optimizer, and a loss function","metadata":{"id":"umVu4Km9kT9b"}},{"cell_type":"markdown","source":"This code defines a loss function for a neural network model. The loss function is used to measure the error between the model's predictions and the true labels. In this case, the loss function is the sparse categorical cross-entropy loss, which is commonly used for multi-class classification problems.\nThe code first defines the `loss` function, which takes two arguments: `labels` and `logits`. The `labels` argument is the true labels for the data, and the `logits` argument is the model's predictions. The function uses the `tf.keras.losses.sparse_categorical_crossentropy` function to compute the cross-entropy loss between the labels and the logits. The `from_logits` argument is set to `True`, which indicates that the logits are not probabilities but rather the unnormalized outputs of the model.\nThe code then computes the loss for an example batch of data. The `target_example_batch` variable contains the true labels for the example batch, and the `example_batch_predictions` variable contains the model's predictions for the example batch. The `loss` function is called with these two variables as arguments, and the result is stored in the `example_batch_loss` variable.\nThe code then prints the shape of the example batch predictions and the shape of the scalar loss. The shape of the example batch predictions is `(batch_size, sequence_length, vocab_size)`, where `batch_size` is the number of examples in the batch, `sequence_length` is the length of the sequences in the batch, and `vocab_size` is the size of the vocabulary. The shape of the scalar loss is `()`, which indicates that it is a scalar value.\nFinally, the code prints the mean of the scalar loss. This value represents the average loss for the example batch.","metadata":{"id":"VgFQ3fIhR00v"}},{"cell_type":"code","source":"def loss(labels, logits):\n entropy = tf.keras.losses.sparse_categorical_crossentropy(\n y_true=labels,\n y_pred=logits,\n from_logits=True\n )\n\n return entropy\n\nexample_batch_loss = loss(target_example_batch, example_batch_predictions)\n\nprint(\"Prediction shape: \", example_batch_predictions.shape, \" # (batch_size, sequence_length, vocab_size)\")\nprint(\"scalar_loss.shape: \", example_batch_loss.shape)\nprint(\"scalar_loss: \", example_batch_loss.numpy().mean())","metadata":{"id":"taxxBk3qQSSj","outputId":"c7913dd8-1e0d-4dfe-982f-8eff65681e14","execution":{"iopub.status.busy":"2024-04-27T15:56:12.318324Z","iopub.execute_input":"2024-04-27T15:56:12.319171Z","iopub.status.idle":"2024-04-27T15:56:12.355941Z","shell.execute_reply.started":"2024-04-27T15:56:12.319141Z","shell.execute_reply":"2024-04-27T15:56:12.354960Z"},"trusted":true},"execution_count":61,"outputs":[{"name":"stdout","text":"Prediction shape: (64, 750, 101) # (batch_size, sequence_length, vocab_size)\nscalar_loss.shape: (64, 750)\nscalar_loss: 4.620006\n","output_type":"stream"}]},{"cell_type":"markdown","source":"`adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)`: This line creates an instance of the Adam optimizer, which is a popular optimization algorithm used in neural networks. The learning rate is set to 0.001, which is a common value for this parameter.\n\n`model.compile(optimizer=adam_optimizer, loss=loss, metrics=['accuracy'])`: This line compiles the model by specifying the optimizer, loss function, and evaluation metric. The model is assumed to have been defined previously.\n\n`optimizer=adam_optimizer`: This argument specifies the optimizer to use during training. In this case, it's the Adam optimizer created in the previous step.\n\n`loss=loss`: This argument specifies the loss function to use during training. The loss variable is assumed to have been defined previously. This loss function is used to calculate the error between the predicted output and the actual output.\n\n`metrics=['accuracy']`: This argument specifies the evaluation metric(s) to use during training. In this case, it's using the accuracy metric, which calculates the fraction of correct predictions.\n\n","metadata":{"id":"XLzrVY0YXl0_"}},{"cell_type":"code","source":"adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n\nmodel.compile(\n optimizer=adam_optimizer,\n loss=loss,\n metrics=['accuracy']\n)","metadata":{"id":"SKQc_EF0VQaR","execution":{"iopub.status.busy":"2024-04-27T15:56:12.357326Z","iopub.execute_input":"2024-04-27T15:56:12.358068Z","iopub.status.idle":"2024-04-27T15:56:12.376102Z","shell.execute_reply.started":"2024-04-27T15:56:12.358033Z","shell.execute_reply":"2024-04-27T15:56:12.375131Z"},"trusted":true},"execution_count":62,"outputs":[]},{"cell_type":"markdown","source":"### Creating a checkpoints directory","metadata":{"id":"tJ3esbgbrNpc"}},{"cell_type":"markdown","source":"1. **checkpoint_dir = 'tmp/checkpoints'**: This line creates a variable called `checkpoint_dir` and assigns it the value 'tmp/checkpoints'. This variable will be used to store the checkpoints during the training process.\n2. **os.makedirs(checkpoint_dir, exist_ok=True)**: This line uses the `os.makedirs()` function from the Python `os` module to create a directory named 'tmp/checkpoints'. The `exist_ok=True` argument specifies that if the directory already exists, it should not raise an error.","metadata":{"id":"AG84x4SkriQC"}},{"cell_type":"code","source":"checkpoint_dir = 'tmp/checkpoints'\nos.makedirs(checkpoint_dir, exist_ok=True)","metadata":{"id":"uMCPsahRrMlx","execution":{"iopub.status.busy":"2024-04-27T15:56:12.377317Z","iopub.execute_input":"2024-04-27T15:56:12.377663Z","iopub.status.idle":"2024-04-27T15:56:12.382619Z","shell.execute_reply.started":"2024-04-27T15:56:12.377630Z","shell.execute_reply":"2024-04-27T15:56:12.381751Z"},"trusted":true},"execution_count":63,"outputs":[]},{"cell_type":"markdown","source":"### Configuring callbacks","metadata":{"id":"kNMuGh6OkYgO"}},{"cell_type":"markdown","source":" This code creates an instance of the `EarlyStopping` callback from the `tf.keras.callbacks` module in TensorFlow. This callback is used to stop the training of a neural network model when a certain condition is met, in this case when the `loss` metric stops improving for a specified number of epochs.\n1. `patience=5`: This argument specifies the number of epochs to wait before stopping the training if the `loss` metric does not improve. In this case, the training will stop if the `loss` metric does not improve for 2 consecutive epochs.\n2. `monitor='loss'`: This argument specifies the metric that the callback will monitor to determine whether to stop the training. In this case, the callback will monitor the `loss` metric.\n3. `restore_best_weights=True`: This argument specifies whether to restore the best weights found during training if the training is stopped early. In this case, the best weights will be restored if the training is stopped early.\n4. `verbose=1`: This argument specifies the verbosity level of the callback. In this case, the callback will print a message to the console every time it checks the `loss` metric.","metadata":{"id":"4WTh8VIdqfmq"}},{"cell_type":"code","source":"early_stopping_callback = tf.keras.callbacks.EarlyStopping(\n patience=5,\n monitor='loss',\n restore_best_weights=True,\n verbose=1\n)","metadata":{"id":"2C0WQ2t2pDes","execution":{"iopub.status.busy":"2024-04-27T15:56:12.383854Z","iopub.execute_input":"2024-04-27T15:56:12.384203Z","iopub.status.idle":"2024-04-27T15:56:12.393258Z","shell.execute_reply.started":"2024-04-27T15:56:12.384177Z","shell.execute_reply":"2024-04-27T15:56:12.392483Z"},"trusted":true},"execution_count":64,"outputs":[]},{"cell_type":"markdown","source":" The first line of code defines the prefix for the checkpoint files. The prefix is the path to the directory where the checkpoint files will be saved, followed by the string `ckpt_` and a placeholder for the epoch number. In this case, the checkpoint files will be saved in the directory `tmp/checkpoints/` and will have names like `ckpt_1.h5`, `ckpt_2.h5`, and so on, where the number corresponds to the epoch at which the checkpoint was saved.\nThe second line of code creates a `ModelCheckpoint` callback. The `ModelCheckpoint` callback is used to save the model's weights at the end of each epoch. The `filepath` argument specifies the prefix for the checkpoint files, and the `save_weights_only` argument specifies that only the model's weights should be saved, not the entire model.","metadata":{"id":"rJ7-dnGuulVR"}},{"cell_type":"code","source":"checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt_{epoch}')\ncheckpoint_callback=tf.keras.callbacks.ModelCheckpoint(\n filepath=checkpoint_prefix,\n save_weights_only=True\n)","metadata":{"id":"-f0QNGU2ruCY","execution":{"iopub.status.busy":"2024-04-27T15:56:12.394523Z","iopub.execute_input":"2024-04-27T15:56:12.395291Z","iopub.status.idle":"2024-04-27T15:56:12.403628Z","shell.execute_reply.started":"2024-04-27T15:56:12.395257Z","shell.execute_reply":"2024-04-27T15:56:12.402724Z"},"trusted":true},"execution_count":65,"outputs":[]},{"cell_type":"markdown","source":"### Execute the training","metadata":{"id":"2un8oTRj89Nu"}},{"cell_type":"markdown","source":"**EPOCHS = 60**:This line sets the number of epochs for training the model. An epoch represents one complete pass through the entire training dataset. In this case, the model will be trained for a total of 65 epochs.\n\n**INITIAL_EPOCH = 1**: This line sets the initial epoch number. It's typically set to 1, indicating that the training starts from the first epoch.\n\n**STEPS_PER_EPOCH = 700**: This line sets the number of steps or iterations per epoch. Each step involves updating the model's weights based on a batch of training data. In this case, there will be 650 steps per epoch","metadata":{"id":"XgOClpl4NJEg"}},{"cell_type":"code","source":"EPOCHS = 60\nINITIAL_EPOCH = 1\nSTEPS_PER_EPOCH = 700","metadata":{"id":"WTT0ACjj9CJ0","execution":{"iopub.status.busy":"2024-04-27T15:56:12.404875Z","iopub.execute_input":"2024-04-27T15:56:12.405180Z","iopub.status.idle":"2024-04-27T15:56:12.413427Z","shell.execute_reply.started":"2024-04-27T15:56:12.405157Z","shell.execute_reply":"2024-04-27T15:56:12.412580Z"},"trusted":true},"execution_count":66,"outputs":[]},{"cell_type":"markdown","source":"The `fit` method is used to train the model on a given dataset.\n1. `x=dataset_train`: This line specifies the training dataset that will be used to train the model. The `dataset_train` variable is assumed to contain the training data, which is typically a collection of input-output pairs.\n2. `epochs=EPOCHS`: This line specifies the number of epochs for which the model will be trained. An epoch refers to one complete pass through the entire training dataset.\n3. `steps_per_epoch=STEPS_PER_EPOCH`: This line specifies the number of steps per epoch. Each step refers to one batch of data that is passed through the model during training.\n4. `initial_epoch=INITIAL_EPOCH`: This line specifies the initial epoch from which the training should start. This is useful if you want to resume training from a previously saved checkpoint.\n5. `callbacks=[checkpoint_callback, early_stopping_callback]`: This line specifies a list of callback functions that will be invoked during the training process. Callback functions are used to monitor the training progress and perform various actions, such as saving checkpoints and early stopping.","metadata":{"id":"lE1gpR4WM68B"}},{"cell_type":"code","source":"\nhistory = model.fit(\n x=dataset_train,\n epochs=EPOCHS,\n steps_per_epoch=STEPS_PER_EPOCH,\n initial_epoch=INITIAL_EPOCH,\n callbacks=[\n checkpoint_callback,\n early_stopping_callback\n ]\n)","metadata":{"id":"8Iu26cqF9Ii4","outputId":"26ae08cf-0da9-4c17-85a8-d38d666a7aaa","execution":{"iopub.status.busy":"2024-04-27T15:56:12.414504Z","iopub.execute_input":"2024-04-27T15:56:12.414764Z","iopub.status.idle":"2024-04-27T19:27:26.185549Z","shell.execute_reply.started":"2024-04-27T15:56:12.414741Z","shell.execute_reply":"2024-04-27T19:27:26.184608Z"},"trusted":true},"execution_count":67,"outputs":[{"name":"stdout","text":"Epoch 2/60\n700/700 [==============================] - 218s 306ms/step - loss: 1.0226 - accuracy: 0.7188\nEpoch 3/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.4705 - accuracy: 0.8586\nEpoch 4/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.4008 - accuracy: 0.8769\nEpoch 5/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3697 - accuracy: 0.8853\nEpoch 6/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3503 - accuracy: 0.8908\nEpoch 7/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3352 - accuracy: 0.8950\nEpoch 8/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3252 - accuracy: 0.8978\nEpoch 9/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3160 - accuracy: 0.9004\nEpoch 10/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3086 - accuracy: 0.9025\nEpoch 11/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.3020 - accuracy: 0.9044\nEpoch 12/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2968 - accuracy: 0.9058\nEpoch 13/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2914 - accuracy: 0.9074\nEpoch 14/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2866 - accuracy: 0.9088\nEpoch 15/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2828 - accuracy: 0.9099\nEpoch 16/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2793 - accuracy: 0.9109\nEpoch 17/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2751 - accuracy: 0.9121\nEpoch 18/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2727 - accuracy: 0.9128\nEpoch 19/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2699 - accuracy: 0.9136\nEpoch 20/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2673 - accuracy: 0.9144\nEpoch 21/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2647 - accuracy: 0.9151\nEpoch 22/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2628 - accuracy: 0.9156\nEpoch 23/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2607 - accuracy: 0.9162\nEpoch 24/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2582 - accuracy: 0.9170\nEpoch 25/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2571 - accuracy: 0.9173\nEpoch 26/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2555 - accuracy: 0.9177\nEpoch 27/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2537 - accuracy: 0.9183\nEpoch 28/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2525 - accuracy: 0.9186\nEpoch 29/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2512 - accuracy: 0.9189\nEpoch 30/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2503 - accuracy: 0.9192\nEpoch 31/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2490 - accuracy: 0.9196\nEpoch 32/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2483 - accuracy: 0.9198\nEpoch 33/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2475 - accuracy: 0.9200\nEpoch 34/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2460 - accuracy: 0.9204\nEpoch 35/60\n700/700 [==============================] - 215s 306ms/step - loss: 0.2458 - accuracy: 0.9204\nEpoch 36/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2450 - accuracy: 0.9206\nEpoch 37/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2444 - accuracy: 0.9209\nEpoch 38/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2437 - accuracy: 0.9211\nEpoch 39/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2436 - accuracy: 0.9211\nEpoch 40/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2429 - accuracy: 0.9212\nEpoch 41/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2422 - accuracy: 0.9214\nEpoch 42/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2421 - accuracy: 0.9215\nEpoch 43/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2421 - accuracy: 0.9214\nEpoch 44/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2408 - accuracy: 0.9218\nEpoch 45/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2414 - accuracy: 0.9217\nEpoch 46/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2410 - accuracy: 0.9217\nEpoch 47/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2406 - accuracy: 0.9219\nEpoch 48/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2405 - accuracy: 0.9219\nEpoch 49/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2404 - accuracy: 0.9219\nEpoch 50/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2402 - accuracy: 0.9219\nEpoch 51/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2397 - accuracy: 0.9221\nEpoch 52/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2400 - accuracy: 0.9220\nEpoch 53/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2401 - accuracy: 0.9219\nEpoch 54/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2392 - accuracy: 0.9222\nEpoch 55/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2394 - accuracy: 0.9221\nEpoch 56/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2395 - accuracy: 0.9220\nEpoch 57/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2395 - accuracy: 0.9220\nEpoch 58/60\n700/700 [==============================] - 215s 306ms/step - loss: 0.2390 - accuracy: 0.9223\nEpoch 59/60\n700/700 [==============================] - 215s 307ms/step - loss: 0.2394 - accuracy: 0.9220\nEpoch 60/60\n700/700 [==============================] - 214s 306ms/step - loss: 0.2395 - accuracy: 0.9220\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Saving the trained model to file","metadata":{"id":"3MLHcmTkhvm5"}},{"cell_type":"markdown","source":" This code is related to saving the recurrent neural network (RNN) model .\n1. **model_name_h5**: This variable is used to define the name of the model file that will be saved. It combines the string 'recipe_generation_rnn_raw_' with the value of the variable INITIAL_EPOCH, which is likely an integer representing the initial epoch of the training process. The resulting string is then concatenated with the '.h5' extension, which is commonly used for saving Keras models in the HDF5 format.\n2. **model.save(model_name, save_format='h5')**: This line of code saves the model named model to the file specified by model_name. The save_format argument is set to 'h5', indicating that the model should be saved in the HDF5 format.\n","metadata":{"id":"1wLOIuJHhULE"}},{"cell_type":"code","source":"model_name_h5 = 'recipe_generation_rnn_raw_' + str(INITIAL_EPOCH) + '.h5'\nmodel.save(model_name_h5, save_format='h5')","metadata":{"id":"F_Lgm0a1f3-X","outputId":"c4e4a5df-3597-4e60-fa87-9fa8fc53e058","execution":{"iopub.status.busy":"2024-04-27T19:47:53.942665Z","iopub.execute_input":"2024-04-27T19:47:53.943376Z","iopub.status.idle":"2024-04-27T19:47:54.093473Z","shell.execute_reply.started":"2024-04-27T19:47:53.943342Z","shell.execute_reply":"2024-04-27T19:47:54.092441Z"},"trusted":true},"execution_count":86,"outputs":[]},{"cell_type":"markdown","source":"1.import pickle: This line imports the pickle module, which provides functions for serializing and deserializing Python objects.\n\n2.model_name_pkl = 'recipe_generation_rnn_raw_' + str(INITIAL_EPOCH) + '.pkl': This line creates a variable model_name_pkl that represents the desired filename for the saved model. It concatenates a string 'recipe_generation_rnn_raw_', the value of the INITIAL_EPOCH variable (converted to a string using str()), and the .pkl extension. The resulting filename will include the value of INITIAL_EPOCH and end with .pkl, indicating that it's a pickle file.\n\n3.with open(model_name_pkl, 'wb') as f:: This line opens a file with the name specified by model_name_pkl in write binary mode ('wb'). The with statement ensures that the file is properly closed after the block of code inside it is executed.\n\n4.pickle.dump(model, f): This line uses the pickle.dump() function to serialize and save the model object to the file f. The pickle.dump() function takes two arguments: the object to be serialized (model in this case) and the file object (f). It writes the serialized representation of the object to the file.","metadata":{}},{"cell_type":"code","source":"import pickle\n\nmodel_name_pkl = 'recipe_generation_rnn_raw_' + str(INITIAL_EPOCH) + '.pkl'\nwith open(model_name_pkl, 'wb') as f:\n pickle.dump(model, f)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:48:06.096506Z","iopub.execute_input":"2024-04-27T19:48:06.096868Z","iopub.status.idle":"2024-04-27T19:48:06.472361Z","shell.execute_reply.started":"2024-04-27T19:48:06.096839Z","shell.execute_reply":"2024-04-27T19:48:06.470980Z"},"trusted":true},"execution_count":87,"outputs":[]},{"cell_type":"markdown","source":"### Visualizing training progress","metadata":{"id":"XazsPU5Fim5R"}},{"cell_type":"markdown","source":"1. **def render_training_history_loss(training_history):**: This line defines a Python function named render_training_history_loss(training_history) that takes a training_history object as its input. This object typically contains information about the training process of a machine learning model, such as the loss values at different epochs.\n2. **loss = training_history.history['loss']**: This line extracts the loss values from the training_history object and stores them in a variable named loss. The 'loss' key is commonly used to store the loss values during training.\n3. **plt.title('Loss')**: This line sets the title of the plot to 'Loss'.\n4. **plt.xlabel('Epoch')**: This line sets the label for the x-axis of the plot to 'Epoch'.\n5. **plt.ylabel('Loss')**: This line sets the label for the y-axis of the plot to 'Loss'.\n6. **plt.plot(loss, label='Training set')**: This line plots the loss values stored in the loss variable. The label 'Training set' is used to identify the plotted line.\n7. **plt.legend()**: This line adds a legend to the plot, which shows the labels associated with each line.\n8. **plt.grid(linestyle='--', linewidth=1, alpha=0.5)**: This line adds a grid to the plot with dashed lines (linestyle='--'), a line width of 1 pixel (linewidth=1), and a transparency of 50% (alpha=0.5).\n9. **plt.show()**: This line displays the plot on the screen.","metadata":{"id":"UQdxiTsXtupj"}},{"cell_type":"code","source":"def render_training_history_loss(training_history):\n loss = training_history.history['loss']\n\n plt.title('Loss')\n plt.xlabel('Epoch')\n plt.ylabel('Loss')\n plt.plot(loss, label='Training set')\n plt.legend()\n plt.grid(linestyle='--', linewidth=1, alpha=0.5)\n plt.show()","metadata":{"id":"kS3hINGxiLei","execution":{"iopub.status.busy":"2024-04-27T19:27:26.625651Z","iopub.execute_input":"2024-04-27T19:27:26.625973Z","iopub.status.idle":"2024-04-27T19:27:26.631876Z","shell.execute_reply.started":"2024-04-27T19:27:26.625944Z","shell.execute_reply":"2024-04-27T19:27:26.630748Z"},"trusted":true},"execution_count":70,"outputs":[]},{"cell_type":"code","source":"render_training_history_loss(history)","metadata":{"id":"yoe2DL4bxfnm","outputId":"1ad6adf6-7e52-4675-9f10-6135f6e03c64","execution":{"iopub.status.busy":"2024-04-27T19:27:26.633124Z","iopub.execute_input":"2024-04-27T19:27:26.633501Z","iopub.status.idle":"2024-04-27T19:27:26.949156Z","shell.execute_reply.started":"2024-04-27T19:27:26.633457Z","shell.execute_reply":"2024-04-27T19:27:26.948372Z"},"trusted":true},"execution_count":71,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"1. **def render_training_history_accuracy(training_history):**: This line defines a Python function named render_training_history_accuracy(training_history) that takes a training_history object as its input. This object typically contains information about the training process of a machine learning model, such as the loss values at different epochs.\n2. **accuracy = training_history.history['accuracy']**: This line extracts the loss values from the training_history object and stores them in a variable named accuracy. The 'accuracy' key is commonly used to store the accuracy values during training.\n3. **plt.title('Accuracy')**: This line sets the title of the plot to 'Accuracy'.\n4. **plt.xlabel('Epoch')**: This line sets the label for the x-axis of the plot to 'Epoch'.\n5. **plt.ylabel('Accuracy')**: This line sets the label for the y-axis of the plot to 'Accuracy'.\n6. **plt.plot(loss, label='Training set')**: This line plots the loss values stored in the accuracy variable. The label 'Training set' is used to identify the plotted line.\n7. **plt.legend()**: This line adds a legend to the plot, which shows the labels associated with each line.\n8. **plt.grid(linestyle='--', linewidth=1, alpha=0.5)**: This line adds a grid to the plot with dashed lines (linestyle='--'), a line width of 1 pixel (linewidth=1), and a transparency of 50% (alpha=0.5).\n9. **plt.show()**: This line displays the plot on the screen.","metadata":{}},{"cell_type":"code","source":"def render_training_history_accuracy(training_history):\n accuracy = training_history.history['accuracy']\n\n plt.title('Accuracy')\n plt.xlabel('Epoch')\n plt.ylabel('Accuracy')\n plt.plot(accuracy, label='Training set')\n plt.legend()\n plt.grid(linestyle='--', linewidth=1, alpha=0.5)\n plt.show()","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:27:26.950239Z","iopub.execute_input":"2024-04-27T19:27:26.950498Z","iopub.status.idle":"2024-04-27T19:27:26.956461Z","shell.execute_reply.started":"2024-04-27T19:27:26.950475Z","shell.execute_reply":"2024-04-27T19:27:26.955446Z"},"trusted":true},"execution_count":72,"outputs":[]},{"cell_type":"code","source":"render_training_history_accuracy(history)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:27:26.957885Z","iopub.execute_input":"2024-04-27T19:27:26.958294Z","iopub.status.idle":"2024-04-27T19:27:27.213140Z","shell.execute_reply.started":"2024-04-27T19:27:26.958261Z","shell.execute_reply":"2024-04-27T19:27:27.212198Z"},"trusted":true},"execution_count":73,"outputs":[{"output_type":"display_data","data":{"text/plain":"
","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"# Test the model","metadata":{}},{"cell_type":"markdown","source":"1. test dataset is created from a `data_test` array using `tf.data.Dataset.from_tensor_slices`. This returns a `tf.data.Dataset` object that can be efficiently processed in parallel during training and evaluation.\n\n2. `transform_element` function is applied to each element in the dataset using map. This function is used to transform the raw input data into a format that can be processed by the model.\n\n3. the dataset is shuffled and batched using shuffle and batch methods. The `SHUFFLE_BUFFER_SIZE` and `BATCH_SIZE` variables are used to set the size of the shuffle buffer and the batch size, respectively. The `drop_remainder` argument is set to True to drop any remaining data that does not make up a complete batch, so that the number of batches is equal to the number of observations divided by the batch size.\n\n4. `model.evaluate` method is called to evaluate the model on the test dataset. This method returns a dictionary with the evaluation metrics specified during the compile method of the model, such as accuracy, loss, and any other metrics defined. The evaluate method returns the average metrics over all batches in the test dataset.","metadata":{}},{"cell_type":"code","source":"# prompt: test data using data_test\n\ntest_data = tf.data.Dataset.from_tensor_slices(data_test)\n\n# Apply the transformation function to each element in the dataset\ntest_data = test_data.map(transform_element)\n\n# Shuffle and batch the dataset\ntest_data = test_data.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)\n\nmodel.evaluate(test_data)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:27:27.214510Z","iopub.execute_input":"2024-04-27T19:27:27.214881Z","iopub.status.idle":"2024-04-27T19:29:35.021772Z","shell.execute_reply.started":"2024-04-27T19:27:27.214847Z","shell.execute_reply":"2024-04-27T19:29:35.020809Z"},"trusted":true},"execution_count":74,"outputs":[{"name":"stdout","text":"298/298 [==============================] - 38s 126ms/step - loss: 0.2457 - accuracy: 0.9201\n","output_type":"stream"},{"execution_count":74,"output_type":"execute_result","data":{"text/plain":"[0.24566957354545593, 0.920109212398529]"},"metadata":{}}]},{"cell_type":"markdown","source":"## Generating text","metadata":{"id":"nQYiKiKfn5C9"}},{"cell_type":"markdown","source":"### Restore the latest checkpoint","metadata":{"id":"Mau8c_NMn6vO"}},{"cell_type":"markdown","source":" The `tf.train.latest_checkpoint(checkpoint_dir)` function in TensorFlow is used to find the latest checkpoint file in a given directory. A checkpoint file contains the saved state of a TensorFlow model, including the values of its trainable variables.\n1. It takes a single argument, `checkpoint_dir`, which is the path to the directory where the checkpoint files are stored.\n2. Inside the directory, it looks for files with the `.index` extension. These files contain the metadata about the checkpoint, such as the version number and the names of the variables that are saved.\n3. It selects the checkpoint file with the highest version number. This is the most recent checkpoint that has been saved.\n4. The function returns the path to the latest checkpoint file.","metadata":{"id":"2oYdB5WDo_Yv"}},{"cell_type":"code","source":"tf.train.latest_checkpoint(checkpoint_dir)","metadata":{"id":"VFtTGIUwn_wY","outputId":"83e4de7c-420f-481a-98fa-589db87f3e4a","execution":{"iopub.status.busy":"2024-04-27T19:29:35.023043Z","iopub.execute_input":"2024-04-27T19:29:35.023400Z","iopub.status.idle":"2024-04-27T19:29:35.030770Z","shell.execute_reply.started":"2024-04-27T19:29:35.023374Z","shell.execute_reply":"2024-04-27T19:29:35.029801Z"},"trusted":true},"execution_count":75,"outputs":[{"execution_count":75,"output_type":"execute_result","data":{"text/plain":"'tmp/checkpoints/ckpt_60'"},"metadata":{}}]},{"cell_type":"markdown","source":"#### Creating simplified version of the model","metadata":{"id":"3g6c9hFp0ltH"}},{"cell_type":"markdown","source":"This code sets the `simplified_batch_size` to 1","metadata":{"id":"JuSc81J018ge"}},{"cell_type":"code","source":"simplified_batch_size = 1","metadata":{"id":"NDnlcAYe0OHJ","execution":{"iopub.status.busy":"2024-04-27T19:29:35.031996Z","iopub.execute_input":"2024-04-27T19:29:35.032281Z","iopub.status.idle":"2024-04-27T19:29:35.040731Z","shell.execute_reply.started":"2024-04-27T19:29:35.032257Z","shell.execute_reply":"2024-04-27T19:29:35.039809Z"},"trusted":true},"execution_count":76,"outputs":[]},{"cell_type":"markdown","source":"This code builds a simplified version of the model with the specified hyperparameters. It then loads the weights from the latest checkpoint in the \"tmp/checkpoints\" directory and builds the model with a batch size of 1.\n1. It defines a function called `build_model` that takes several arguments, including the vocabulary size, embedding dimension, RNN units, and batch size. This function is used to build the model architecture.\n2. It sets the `simplified_batch_size` to 1. This means that the simplified model will only process one sample at a time.\n3. It calls the `build_model` function to build the simplified model with the specified hyperparameters and a batch size of 1.\n4. It loads the weights from the latest checkpoint in the \"tmp/checkpoints\" directory into the simplified model. This allows the simplified model to start training from the point where the original model left off.\n5. It builds the simplified model with a batch size of 1. This is necessary because the model was built with a batch size of 1, and it cannot be changed after the model has been built.\n","metadata":{"id":"_atBzxJs10yO"}},{"cell_type":"code","source":"model_simplified = build_model(vocab_size=VOCABULARY_SIZE,\n embedding_dim=256,\n rnn_units=1024,\n batch_size=simplified_batch_size)\nmodel_simplified.load_weights(tf.train.latest_checkpoint(\"tmp/checkpoints\"))\nmodel_simplified.build(tf.TensorShape([simplified_batch_size, None]))","metadata":{"id":"F0vDHYTi01DM","execution":{"iopub.status.busy":"2024-04-27T19:29:35.041873Z","iopub.execute_input":"2024-04-27T19:29:35.042158Z","iopub.status.idle":"2024-04-27T19:29:35.363013Z","shell.execute_reply.started":"2024-04-27T19:29:35.042135Z","shell.execute_reply":"2024-04-27T19:29:35.361973Z"},"trusted":true},"execution_count":77,"outputs":[]},{"cell_type":"markdown","source":"This code prints a summary of the simplified model's architecture. This provides information about the model's layers, their shapes, and the number of trainable parameters.","metadata":{"id":"jDyaQp_02P3t"}},{"cell_type":"code","source":"model_simplified.summary()","metadata":{"id":"8IsCMv4l1KHW","outputId":"faa61f2d-7449-4f0b-fe5d-6ffdf41ede9b","execution":{"iopub.status.busy":"2024-04-27T19:29:35.364257Z","iopub.execute_input":"2024-04-27T19:29:35.364536Z","iopub.status.idle":"2024-04-27T19:29:35.382644Z","shell.execute_reply.started":"2024-04-27T19:29:35.364512Z","shell.execute_reply":"2024-04-27T19:29:35.381771Z"},"trusted":true},"execution_count":78,"outputs":[{"name":"stdout","text":"Model: \"sequential_1\"\n_________________________________________________________________\n Layer (type) Output Shape Param # \n=================================================================\n embedding_1 (Embedding) (1, None, 256) 25856 \n \n lstm_1 (LSTM) (1, None, 1024) 5246976 \n \n dense_1 (Dense) (1, None, 101) 103525 \n \n=================================================================\nTotal params: 5376357 (20.51 MB)\nTrainable params: 5376357 (20.51 MB)\nNon-trainable params: 0 (0.00 Byte)\n_________________________________________________________________\n","output_type":"stream"}]},{"cell_type":"markdown","source":"This specifies the number of features or variables and the order in which they should be provided to the model for processing.","metadata":{"id":"3BG9npNr3dfl"}},{"cell_type":"code","source":"model_simplified.input_shape","metadata":{"id":"Ub4KLTr52b5x","outputId":"e2ca1a05-5732-4a4c-c946-032b50ca8bb6","execution":{"iopub.status.busy":"2024-04-27T19:29:35.383763Z","iopub.execute_input":"2024-04-27T19:29:35.384051Z","iopub.status.idle":"2024-04-27T19:29:35.390022Z","shell.execute_reply.started":"2024-04-27T19:29:35.384026Z","shell.execute_reply":"2024-04-27T19:29:35.388955Z"},"trusted":true},"execution_count":79,"outputs":[{"execution_count":79,"output_type":"execute_result","data":{"text/plain":"(1, None)"},"metadata":{}}]},{"cell_type":"markdown","source":"This code defines a function generate_text that generates text using a trained machine learning model.\n\n**padded_start_string = STOP_WORD_TITLE + start_string**: This line creates a padded start string by concatenating the STOP_WORD_TITLE constant with the start_string argument. The STOP_WORD_TITLE constant is used to separate the generated text from the input text.\n\n**input_indices = np.array(tokenizer.texts_to_sequences([padded_start_string])):** This line converts the padded start string to a sequence of integers using the tokenizer object. The tokenizer object is a Keras utility that converts text into a sequence of integers that can be used as input to a machine learning model.\n\n**text_generated = []:** This line initializes an empty list to store the generated text.\n\n**model.reset_states():** This line resets the internal state of the machine learning model. This is necessary because the model is a recurrent neural network, which maintains an internal state between predictions.\n\n**for char_index in range(num_generate)::** This loop generates num_generate characters of text.\n\n**predictions = model(input_indices):** This line feeds the input sequence to the machine learning model and generates predictions.\n\n**predictions = tf.squeeze(predictions, 0)**: This line removes the batch dimension from the predictions. The batch dimension is not needed because we are generating only one character at a time.\n\n**predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy():** This line generates a random sample from the predicted distribution. The tf.random.categorical function generates a categorical distribution from the predictions, and the [-1,0] indexing operation extracts the predicted probability of the next character.\n\n**input_indices = tf.expand_dims([predicted_id], 0):** This line converts the predicted character index into a sequence of length 1. This is necessary because the machine learning model expects input sequences of a fixed length.\n\n**next_character = tokenizer.sequences_to_texts(input_indices.numpy())[0]:** This line converts the predicted character index back into a character using the tokenizer object.\n\n**text_generated.append(next_character):** This line appends the generated character to the text_generated list.\n\n**return (padded_start_string + ''.join(text_generated)):** This line returns the generated text as a string. The padded_start_string constant is prepended to the generated text to separate it from the input text.","metadata":{"id":"bsZrvj3Pk-sf"}},{"cell_type":"code","source":"def generate_text(model, start_string, num_generate = 1000):\n # Evaluation step (generating text using the learned model)\n\n padded_start_string = STOP_WORD_TITLE + start_string\n\n # Converting our start string to numbers (vectorizing).\n input_indices = np.array(tokenizer.texts_to_sequences([padded_start_string]))\n\n # Empty string to store our results.\n text_generated = []\n\n # Here batch size == 1.\n model.reset_states()\n for char_index in range(num_generate):\n predictions = model(input_indices)\n # remove the batch dimension\n predictions = tf.squeeze(predictions, 0)\n\n # Using a categorical distribution to predict the character returned by the model.\n predicted_id = tf.random.categorical(\n predictions,\n num_samples=1\n )[-1,0].numpy()\n\n # We pass the predicted character as the next input to the model\n # along with the previous hidden state.\n input_indices = tf.expand_dims([predicted_id], 0)\n\n next_character = tokenizer.sequences_to_texts(input_indices.numpy())[0]\n\n text_generated.append(next_character)\n\n return (padded_start_string + ''.join(text_generated))","metadata":{"id":"u5YfR0N84kLi","execution":{"iopub.status.busy":"2024-04-27T19:29:35.391191Z","iopub.execute_input":"2024-04-27T19:29:35.391491Z","iopub.status.idle":"2024-04-27T19:29:35.399965Z","shell.execute_reply.started":"2024-04-27T19:29:35.391467Z","shell.execute_reply":"2024-04-27T19:29:35.399053Z"},"trusted":true},"execution_count":80,"outputs":[]},{"cell_type":"markdown","source":"This code defines a function generate_combinations that generates text using different combinations of input strings and temperature values. Here's a breakdown of what it does:\n\n**recipe_length = 751**: This line sets the length of the generated text to 1000 characters.\n\n**ingredients = input(\"Enter your ingredients (One or two ingredients): \")**: This line prompts the user to enter one or two ingredients using the input function and assigns the user's input to the variable ingredients.\n\n\n\n**generated_text = generate_text(model, start_string=letter, num_generate = recipe_length, temperature=temperature)**: This line generates text using the generate_text function. The model argument is the trained machine learning model, start_string is the ingredients string, and num_generate is the length of the generated text.\n\n**print(f'Attempt: \"{letter}\")**: This line prints a message indicating the current input string .\n\n**print('-----------------------------------')**: This line prints a separator line.\n\n**print(generated_text)**: This line prints the generated text.\n\n**print('\\n\\n')**: This line prints a blank line to separate the generated text from the next iteration.\n","metadata":{"id":"R2vHWq2olutu"}},{"cell_type":"code","source":"def generate_combinations(model):\n recipe_length = 751\n ingredients = ['rice ','potato ','pasta ','onion ','chicken ','meat ','salt ',\n 'sugar ','fish ','cheese ','toamato ','chocolate ','vanilla ']\n for ingredient in ingredients:\n generated_text = generate_text(\n model,\n start_string=ingredient,\n num_generate = recipe_length,\n )\n print(f'Attempt: \"{ingredient}\"')\n print('-----------------------------------')\n print(generated_text)\n print('\\n\\n')","metadata":{"id":"2lcLuauRDMMS","execution":{"iopub.status.busy":"2024-04-27T19:29:35.401083Z","iopub.execute_input":"2024-04-27T19:29:35.401470Z","iopub.status.idle":"2024-04-27T19:29:35.413527Z","shell.execute_reply.started":"2024-04-27T19:29:35.401446Z","shell.execute_reply":"2024-04-27T19:29:35.412622Z"},"trusted":true},"execution_count":81,"outputs":[]},{"cell_type":"code","source":"generate_combinations(model_simplified)","metadata":{"id":"xhvbqSiEbZeU","outputId":"1c2eefaa-380e-4f83-b317-12e30ff737c3","execution":{"iopub.status.busy":"2024-04-27T19:29:35.423962Z","iopub.execute_input":"2024-04-27T19:29:35.424925Z","iopub.status.idle":"2024-04-27T19:31:03.327183Z","shell.execute_reply.started":"2024-04-27T19:29:35.424873Z","shell.execute_reply":"2024-04-27T19:31:03.326200Z"},"trusted":true},"execution_count":82,"outputs":[{"name":"stdout","text":"Attempt: \"rice \"\n-----------------------------------\n📕 rice Pickled Spaghetti And Very Loves And Pretzelle) \n\n🥩\n\n• \"6 oz. nonfat yogurt\"\n• \"16 oz. Kraft Caramels\"\n• \"16 to 20 salmon fillets\"\n\n✍️\n\n▪︎ \"Spread peanut butter on bread slice.\"\n▪︎ \"Cover with a tito small piece of banana slices and top with cherries\n▪︎ reserving sugar to top.\"\n▪︎ \"For best results\n▪︎ use 3 1/2 tablespoons chopped pecans into cheesecloth bag.\"\n▪︎ \"Top with remaining bread cube and bake as directed on directions.\"\n▪︎ \"Stir except using 3 cups bit slices (youblet).\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"potato \"\n-----------------------------------\n📕 potato Shoulder\"\n\n🥩\n\n• \"1 pkg. frozen hash brown potatoes\"\n• \"1 large onion\n• chopped\"\n• \"2 eggs\"\n• \"1 1/2 sleeves American cheese\"\n• \"salt and freshly ground pepper to taste\"\n• \"lemon slices or can Bleuffles\"\n\n✍️\n\n▪︎ \"Mix all ingredients.\"\n▪︎ \"Bake in baking dish for 30 minutes at 350\\u00b0.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"pasta \"\n-----------------------------------\n📕 pasta or Viva Turtles\n\n🥩\n\n• \"2 c. Fruit\n• crushed\"\n• \"1 tsp. almond extract\"\n• \"3 c. flour\"\n• \"1 c. packed brown sugar\"\n• \"1 c. whole wheat flour\"\n• \"1/2 tsp. salt\"\n• \"1/2 tsp. baking powder\"\n• \"1/2 tsp. soda\"\n• \"1 tsp. ground cinnamon\"\n• \"1/2 tsp. baking soda\"\n\n✍️\n\n▪︎ \"Preheat oven to 375\\u00b0.\"\n▪︎ \"Combine flour\n▪︎ sugar\n▪︎ baking powder\n▪︎ baking soda and cloves.\"\n▪︎ \"Add Crisco\n▪︎ egg and milk.\"\n▪︎ \"Beat until smooth.\"\n▪︎ \"Add flour\n▪︎ baking soda and chocolate chunks.\"\n▪︎ \"Stir in first mixture.\"\n▪︎ \"Fill greased 1 3/4 x 8 1/2-inch loaf pan (greased pans) and bake 35 to 40 minutes or until brown.\"\n▪︎ \"Cool on rack.\"\n▪︎ \"Sprinkle pecan pieces with warm Crisco.\"\n▪︎ \"Serve when cool.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"onion \"\n-----------------------------------\n📕 onion Countraunt\n\n🥩\n\n• \"3 large potatoes\n• peeled and thinly sliced\"\n• \"2 Tbsp. butter or margarine\"\n• \"1 1/4 tsp. salt\"\n• \"1/8 tsp. pepper\"\n• \"1 c. flour\"\n• \"1/4 tsp. pepper\"\n• \"Tony Cake\"\n\n✍️\n\n▪︎ \"Combine first five ingredients in this oil.\"\n▪︎ \"Mix well.\"\n▪︎ \"Pare and press into bottom of greased pan or 1 1/2-quart shallow casserole.\"\n▪︎ \"Dot with butter.\"\n▪︎ \"Sprinkle 1 cup of tarragon with flaked corn flakes.\"\n▪︎ \"Bake in 375\\u00b0 oven until brown.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"chicken \"\n-----------------------------------\n📕 chicken Schnop Cooking Au Gratin\n\n🥩\n\n• \"1 1/2 c. flour\"\n• \"2 2/3 c. quick-cooking oats (I use 2 c. or more)\"\n• \"1 tsp. salt\"\n• \"1/4 tsp. soda\"\n• \"1 tsp. baking powder\"\n• \"1/4 tsp. salt\"\n• \"1 egg\"\n• \"1 Tbsp. oil\"\n• \"peel of 3 large bunkers -Lax golden bread loaf pastry\"\n• \"1/2 c. butter or margarine\n• melted\"\n• \"1/2 c. firmly packed brown sugar\"\n• \"1/4 c. water\"\n\n✍️\n\n▪︎ \"In medium bowl\n▪︎ sift the flour\n▪︎ baking powder\n▪︎ cinnamon\n▪︎ nutmeg and sugar.\"\n▪︎ \"Mix sugar\n▪︎ molasses\n▪︎ shortening and 1/4 cup Buttermilk.\"\n▪︎ \"Add to the egg mixture\n▪︎ beating constantly to boiling.\"\n▪︎ \"Add the diced banana and stir thoroughly.\"\n▪︎ \"Mixture will be thick.\"\n▪︎ \"Pour into a greased 9-inch square pan.\"\n▪︎ \"Cover with apple mixture.\"\n▪︎ \"Bake at 350\\u00b0 for\n\n\n\nAttempt: \"meat \"\n-----------------------------------\n📕 meat Brothoust Tramler) \n\n🥩\n\n• \"1 c. flour\"\n• \"1/2 tsp. cinnamon\"\n• \"1 1/2 tsp. baking powder\"\n• \"1/2 c. persimmon pulp\"\n• \"3/4 c. white sugar\"\n• \"3/4 c. brown sugar\"\n• \"1 1/4 c. whole wheat flour\"\n• \"3/4 c. rhubarb\n• chopped *\"\n• \"1/2 tsp. baking soda\"\n• \"1/4 tsp. salt\"\n• \"1/2 c. raisins\"\n\n✍️\n\n▪︎ \"Mix dry ingredients; sift together.\"\n▪︎ \"Add eggs\n▪︎ salt\n▪︎ sugar and margarine; beat well. Place a thin layer in 11 x 7 x 2-inch baking pan; coat well.\"\n▪︎ \"Bake at 350\\u00b0 for 30 to 45 minutes\n▪︎ or until toothpick comes out clean.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"salt \"\n-----------------------------------\n📕 salt Red Au Sugar Ludge) \n\n🥩\n\n• \"1 lb. marshmallows or 3 Tbsp. instant coffee (strawberries\n• None-Lay Piero shell\"\n• \"1/2 to 1 tsp. caramolis liquid\"\n• \"1 Tbsp. creme de cacao\"\n• \"1 tsp. caref nuts (optional)\"\n\n✍️\n\n▪︎ \"In medium bowl\n▪︎ mix chocolate whip in bowl.\"\n▪︎ \"Add egg white\n▪︎ vanilla and walnuts to which carefully to mash.\"\n▪︎ \"Gently heat sweet real gooden over medium heat until sugar melts.\"\n▪︎ \"Pour hot liquid into mold.\"\n▪︎ \"\\tid\"\n▪︎ \"to tray.\"\n▪︎ \"Use white paper fitted Ly 2 bothes Cake.\"\n▪︎ \"Spread soda wafers over cherry.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"sugar \"\n-----------------------------------\n📕 sugar Co boiled Campfires\n\n🥩\n\n• \"1 1/4 c. all-purpose flour\"\n• \"2 tsp. baking powder\"\n• \"1/2 tsp. salt\"\n• \"1/2 c. soft shortening (margarine)\"\n• \"2 eggs\"\n• \"1 c. sugar\"\n• \"1 Tbsp. grated orange peel\"\n• \"1 tsp. vanilla\"\n• \"1 c. oat bread crumbs (about 3 c.)\"\n• \"1/2 c. chopped pecans or walnuts or calories or A Blender instant coffee powder\"\n• \"1/4 tsp. ground cinnamon\"\n\n✍️\n\n▪︎ \"Heat oven to 375\\u00b0. Spoon from dough into a 9 x 12-inch pan; set aside. Beat egg whites until stiff; fold into the batter. Pour into pan. Gradually add unbeaten egg and blend.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"fish \"\n-----------------------------------\n📕 fish Do No EEg Holeyamati\n\n🥩\n\n• \"1 3/4 c. dry bread crumbs or curry or 6 (8 oz.) pkg. frozen chopped spinach\n• thawed and drained\"\n• \"1 tsp. salt\"\n• \"1/2 tsp. oregano\"\n• \"1/2 tsp. paprika\"\n• \"1/4 tsp. crushed basil\"\n• \"1/2 tsp. grated orange peel\"\n• \"1 can (115 oz.) Monte Mexican-style shinack (fat-free) and cut up an eighttore baked\n• 6 to 8 sheets of a mest.\"\n\n✍️\n\n▪︎ \"In a 1 1/2-quart baking dish\n▪︎ blend soup\n▪︎ milk\n▪︎ dash of paprika and pepper sauce.\"\n▪︎ \"Add the onion mixture with pork chops and spice mixture.\"\n▪︎ \"Sprinkle enough margarine in a pan on all sides of bag to brown; cover and bake at 350\\u00b0 for 45 minutes.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"cheese \"\n-----------------------------------\n📕 cheese square Cookie Hhill shrimp and Jimmed Eggplant Garkins Or Bouil crust for 24 barbecues eating 6 minutes in 13 quarts)\"\n• \"8 Tbsp. honey\"\n\n✍️\n\n▪︎ \"Microwave:\"\n▪︎ \"Combine all ingredients in a bowl; add salt supreme and mix.\"\n▪︎ \"Refrigerate three darl.\"\n▪︎ \"Makes about 2 cups.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"toamato \"\n-----------------------------------\n📕 toamato Chip)(Serves 10) \n\n🥩\n\n• \"1/2 c. butter or margarine\"\n• \"6 thin slices Swith picks\"\n• \"1 tsp. onion salt\"\n• \"pepper or sesame oil\"\n• \"salt and pepper to taste\"\n• \"Hot of whole cinnamon hearts\"\n• \"maple flour\"\n• \"3 lb. wieners\"\n• \"cauliflower (optional)\"\n\n✍️\n\n▪︎ \"Heat half cinnamon and whet blender (extra mounds).\"\n▪︎ \"Roll 3 times\n▪︎ debone fryer 4 to 5 inches from heat and drain well (or add about 1 cup of chopped nuts if you use an ovenca or low heat\n▪︎ while can thicken pouring all the way through consistency over lid or fish filling to the touch together; leave them out) halfway and strain through away.\"\n▪︎ \"Remove the seeds on both sides.\"\n▪︎ \"Place the skin on bottom rack of bearing together in 10-inch skillet.\"\n▪︎ Salt to taste.\"\n▪︎ \n\n\n\nAttempt: \"chocolate \"\n-----------------------------------\n📕 chocolate Squares\n\n🥩\n\n• \"3 c. flour\"\n• \"1/4 c. sugar\"\n• \"2 1/2 tsp. baking powder\"\n• \"1/2 tsp. salt\"\n• \"1 c. apple butter\"\n• \"1 egg\"\n• \"1/2 tsp. cinnamon\"\n• \"pastry for 1-cup\"\n\n✍️\n\n▪︎ \"Sift flour\n▪︎ baking powder and salt.\"\n▪︎ \"Beat cats egg.\"\n▪︎ \"Add sugar and shortening.\"\n▪︎ \"Add buttermilk and egg and mix thoroughly.\"\n▪︎ \"Fold in apples.\"\n▪︎ \"Bake in oven for 15 to 20 minutes at 375\\u00b0.\"\n␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣\n\n\n\nAttempt: \"vanilla \"\n-----------------------------------\n📕 vanilla Fread\n\n🥩\n\n• \"2 to 4 c. frozen cooking apricot halves\"\n• \"1 1/4 c. sugar\"\n• \"3 Tbsp. flour\"\n• \"1/2 tsp. cloves\"\n• \"1 tsp. ground cinnamon\"\n• \"1 tsp. ground nutmeg\"\n• \"1 tsp. nutmeg\"\n• \"3/4 c. Fruits of chocolate candies\"\n• \"1/4 c. butter\"\n• \"1 1/2 tsp. almond flavoring\"\n• \"fruit (thawed)\"\n\n✍️\n\n▪︎ \"Melt the butter\n▪︎ butter and chocolate over low heat\n▪︎ stirring until smooth; add to boiling water a sugar and stir until mixture is almost melted.\"\n▪︎ \"Remove from heat. Stir in pecans\n▪︎ honey\n▪︎ vanilla and baking soda until the mixture is light and fluffy.\"\n▪︎ \"Pour thoroughly mix into this oil in saucepan\n▪︎ stirring with fork.\"\n▪︎ \"Stir in nuts and coconut and combine with remaining ingredients.\"\n▪︎ \"Pour into pastry lined 9-inch square pan.\n\n\n\n","output_type":"stream"}]},{"cell_type":"code","source":"def generate_combinations(model):\n recipe_length = 751\n ingredients = input(\"Enter your ingredients (One or two ingredients): \")\n generated_text = generate_text(\n model,\n start_string=ingredients,\n num_generate = recipe_length\n )\n print(f'Attempt: \"{ingredients}\"')\n print('-----------------------------------')\n print(generated_text)\n print('\\n\\n')","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:31:03.328638Z","iopub.execute_input":"2024-04-27T19:31:03.329323Z","iopub.status.idle":"2024-04-27T19:31:03.334735Z","shell.execute_reply.started":"2024-04-27T19:31:03.329286Z","shell.execute_reply":"2024-04-27T19:31:03.333763Z"},"trusted":true},"execution_count":83,"outputs":[]},{"cell_type":"code","source":"generate_combinations(model_simplified)","metadata":{"execution":{"iopub.status.busy":"2024-04-27T19:31:03.336080Z","iopub.execute_input":"2024-04-27T19:31:03.337028Z","iopub.status.idle":"2024-04-27T19:47:49.172475Z","shell.execute_reply.started":"2024-04-27T19:31:03.336991Z","shell.execute_reply":"2024-04-27T19:47:49.170951Z"},"trusted":true},"execution_count":84,"outputs":[{"traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)","Cell \u001b[0;32mIn[84], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mgenerate_combinations\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_simplified\u001b[49m\u001b[43m)\u001b[49m\n","Cell \u001b[0;32mIn[83], line 3\u001b[0m, in \u001b[0;36mgenerate_combinations\u001b[0;34m(model)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_combinations\u001b[39m(model):\n\u001b[1;32m 2\u001b[0m recipe_length \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m751\u001b[39m\n\u001b[0;32m----> 3\u001b[0m ingredients \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43minput\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mEnter your ingredients (One or two ingredients): \u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4\u001b[0m generated_text \u001b[38;5;241m=\u001b[39m generate_text(\n\u001b[1;32m 5\u001b[0m model,\n\u001b[1;32m 6\u001b[0m start_string\u001b[38;5;241m=\u001b[39mingredients,\n\u001b[1;32m 7\u001b[0m num_generate \u001b[38;5;241m=\u001b[39m recipe_length\n\u001b[1;32m 8\u001b[0m )\n\u001b[1;32m 9\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mAttempt: \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mingredients\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m'\u001b[39m)\n","File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py:1202\u001b[0m, in \u001b[0;36mKernel.raw_input\u001b[0;34m(self, prompt)\u001b[0m\n\u001b[1;32m 1200\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraw_input was called, but this frontend does not support input requests.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1201\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m StdinNotImplementedError(msg)\n\u001b[0;32m-> 1202\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_input_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1203\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1204\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_parent_ident\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mshell\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1205\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_parent\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mshell\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1206\u001b[0m \u001b[43m \u001b[49m\u001b[43mpassword\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 1207\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py:1245\u001b[0m, in \u001b[0;36mKernel._input_request\u001b[0;34m(self, prompt, ident, parent, password)\u001b[0m\n\u001b[1;32m 1242\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m:\n\u001b[1;32m 1243\u001b[0m \u001b[38;5;66;03m# re-raise KeyboardInterrupt, to truncate traceback\u001b[39;00m\n\u001b[1;32m 1244\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInterrupted by user\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m-> 1245\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m(msg) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 1246\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[1;32m 1247\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlog\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalid Message:\u001b[39m\u001b[38;5;124m\"\u001b[39m, exc_info\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n","\u001b[0;31mKeyboardInterrupt\u001b[0m: Interrupted by user"],"ename":"KeyboardInterrupt","evalue":"Interrupted by user","output_type":"error"}]},{"cell_type":"code","source":"","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"","metadata":{"scrolled":true},"execution_count":null,"outputs":[]}]}