{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Msc. BDS Module - Data Engineering and Machine Learning Operations in Business (MLOPs)** - Part 02: Feature Pipeline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🗒️ The notebook is divided into the following sections:\n", "1. Parsing new data.\n", "2. Inserting the new data into the Feature Store." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ⚙️ Import of libraries and packages\n", "\n", "We start by accessing the folder we have created that holds the functions (incl. live API calls and data preprocessing) we need for electricity prices and weather measures. Then, we proceed to import some of the necessary libraries needed for this notebook and warnings to avoid unnecessary distractions and keep output clean." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-\n", "/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-/notebooks\n" ] } ], "source": [ "# First we go one back in our directory to access the folder with our functions\n", "%cd ..\n", "\n", "# Now we import the functions from the features folder\n", "# This is the functions we have created to generate features for electricity prices and weather measures\n", "from features import electricity_prices, weather_measures\n", "\n", "# We go back into the notebooks folder\n", "%cd notebooks" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Importing pandas for data handling\n", "import pandas as pd\n", "\n", "# Ignore warnings\n", "import warnings \n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🪄 Parsing New Data\n", "To fetch non-historical electricity prices we are setting `historical` to `False`. \n", "\n", "In order to provide real time weather measures, a weather forecast measure for the next 5 days is being fetched.\n", "\n", "There are of course no changes to the calendar data, and therefore no new data is retrieved from it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 💸 Electricity Prices per day from Energinet" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Fetching non-historical electricity prices for area DK1\n", "electricity_df = electricity_prices.electricity_prices(\n", " historical=False,\n", " area=[\"DK1\"]\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampdatetimedatehourdk1_spotpricedkk_kwh
017149536000002024-05-06 00:00:002024-05-0600.61803
117149572000002024-05-06 01:00:002024-05-0610.59364
217149608000002024-05-06 02:00:002024-05-0620.59975
317149644000002024-05-06 03:00:002024-05-0630.59632
417149680000002024-05-06 04:00:002024-05-0640.60930
517149716000002024-05-06 05:00:002024-05-0650.65271
617149752000002024-05-06 06:00:002024-05-0660.79875
717149788000002024-05-06 07:00:002024-05-0670.97157
817149824000002024-05-06 08:00:002024-05-0680.74930
917149860000002024-05-06 09:00:002024-05-0690.66383
1017149896000002024-05-06 10:00:002024-05-06100.60416
1117149932000002024-05-06 11:00:002024-05-06110.47937
1217149968000002024-05-06 12:00:002024-05-06120.56880
1317150004000002024-05-06 13:00:002024-05-06130.55724
1417150040000002024-05-06 14:00:002024-05-06140.55858
1517150076000002024-05-06 15:00:002024-05-06150.60498
1617150112000002024-05-06 16:00:002024-05-06160.60878
1717150148000002024-05-06 17:00:002024-05-06170.69635
1817150184000002024-05-06 18:00:002024-05-06180.80830
1917150220000002024-05-06 19:00:002024-05-06191.01923
2017150256000002024-05-06 20:00:002024-05-06201.07398
2117150292000002024-05-06 21:00:002024-05-06210.80539
2217150328000002024-05-06 22:00:002024-05-06220.70544
2317150364000002024-05-06 23:00:002024-05-06230.62966
\n", "
" ], "text/plain": [ " timestamp datetime date hour dk1_spotpricedkk_kwh\n", "0 1714953600000 2024-05-06 00:00:00 2024-05-06 0 0.61803\n", "1 1714957200000 2024-05-06 01:00:00 2024-05-06 1 0.59364\n", "2 1714960800000 2024-05-06 02:00:00 2024-05-06 2 0.59975\n", "3 1714964400000 2024-05-06 03:00:00 2024-05-06 3 0.59632\n", "4 1714968000000 2024-05-06 04:00:00 2024-05-06 4 0.60930\n", "5 1714971600000 2024-05-06 05:00:00 2024-05-06 5 0.65271\n", "6 1714975200000 2024-05-06 06:00:00 2024-05-06 6 0.79875\n", "7 1714978800000 2024-05-06 07:00:00 2024-05-06 7 0.97157\n", "8 1714982400000 2024-05-06 08:00:00 2024-05-06 8 0.74930\n", "9 1714986000000 2024-05-06 09:00:00 2024-05-06 9 0.66383\n", "10 1714989600000 2024-05-06 10:00:00 2024-05-06 10 0.60416\n", "11 1714993200000 2024-05-06 11:00:00 2024-05-06 11 0.47937\n", "12 1714996800000 2024-05-06 12:00:00 2024-05-06 12 0.56880\n", "13 1715000400000 2024-05-06 13:00:00 2024-05-06 13 0.55724\n", "14 1715004000000 2024-05-06 14:00:00 2024-05-06 14 0.55858\n", "15 1715007600000 2024-05-06 15:00:00 2024-05-06 15 0.60498\n", "16 1715011200000 2024-05-06 16:00:00 2024-05-06 16 0.60878\n", "17 1715014800000 2024-05-06 17:00:00 2024-05-06 17 0.69635\n", "18 1715018400000 2024-05-06 18:00:00 2024-05-06 18 0.80830\n", "19 1715022000000 2024-05-06 19:00:00 2024-05-06 19 1.01923\n", "20 1715025600000 2024-05-06 20:00:00 2024-05-06 20 1.07398\n", "21 1715029200000 2024-05-06 21:00:00 2024-05-06 21 0.80539\n", "22 1715032800000 2024-05-06 22:00:00 2024-05-06 22 0.70544\n", "23 1715036400000 2024-05-06 23:00:00 2024-05-06 23 0.62966" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the electricity dataframe\n", "electricity_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 🌈 Forecast Weather Measures from Open Meteo" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Fetching weather forecast measures for the next 5 days\n", "weather_forecast_df = weather_measures.forecast_weather_measures(\n", " forecast_length=5\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampdatetimedatehourtemperature_2mrelative_humidity_2mprecipitationrainsnowfallweather_codecloud_coverwind_speed_10mwind_gusts_10m
017149536000002024-05-06 00:00:002024-05-0609.693.00.20.20.051.0100.014.424.8
117149572000002024-05-06 01:00:002024-05-0619.793.00.00.00.03.0100.014.024.8
217149608000002024-05-06 02:00:002024-05-0629.591.00.00.00.03.0100.014.024.8
317149644000002024-05-06 03:00:002024-05-0639.591.00.00.00.03.0100.013.023.4
417149680000002024-05-06 04:00:002024-05-0649.692.00.00.00.03.0100.014.024.1
..........................................
11517153676000002024-05-10 19:00:002024-05-101911.568.00.00.00.03.089.05.213.0
11617153712000002024-05-10 20:00:002024-05-102010.571.00.00.00.03.088.03.48.6
11717153748000002024-05-10 21:00:002024-05-10219.574.00.00.00.03.087.02.54.3
11817153784000002024-05-10 22:00:002024-05-10228.678.00.00.00.03.091.02.64.3
11917153820000002024-05-10 23:00:002024-05-10237.881.00.00.00.03.096.02.54.3
\n", "

120 rows × 13 columns

\n", "
" ], "text/plain": [ " timestamp datetime date hour temperature_2m \\\n", "0 1714953600000 2024-05-06 00:00:00 2024-05-06 0 9.6 \n", "1 1714957200000 2024-05-06 01:00:00 2024-05-06 1 9.7 \n", "2 1714960800000 2024-05-06 02:00:00 2024-05-06 2 9.5 \n", "3 1714964400000 2024-05-06 03:00:00 2024-05-06 3 9.5 \n", "4 1714968000000 2024-05-06 04:00:00 2024-05-06 4 9.6 \n", ".. ... ... ... ... ... \n", "115 1715367600000 2024-05-10 19:00:00 2024-05-10 19 11.5 \n", "116 1715371200000 2024-05-10 20:00:00 2024-05-10 20 10.5 \n", "117 1715374800000 2024-05-10 21:00:00 2024-05-10 21 9.5 \n", "118 1715378400000 2024-05-10 22:00:00 2024-05-10 22 8.6 \n", "119 1715382000000 2024-05-10 23:00:00 2024-05-10 23 7.8 \n", "\n", " relative_humidity_2m precipitation rain snowfall weather_code \\\n", "0 93.0 0.2 0.2 0.0 51.0 \n", "1 93.0 0.0 0.0 0.0 3.0 \n", "2 91.0 0.0 0.0 0.0 3.0 \n", "3 91.0 0.0 0.0 0.0 3.0 \n", "4 92.0 0.0 0.0 0.0 3.0 \n", ".. ... ... ... ... ... \n", "115 68.0 0.0 0.0 0.0 3.0 \n", "116 71.0 0.0 0.0 0.0 3.0 \n", "117 74.0 0.0 0.0 0.0 3.0 \n", "118 78.0 0.0 0.0 0.0 3.0 \n", "119 81.0 0.0 0.0 0.0 3.0 \n", "\n", " cloud_cover wind_speed_10m wind_gusts_10m \n", "0 100.0 14.4 24.8 \n", "1 100.0 14.0 24.8 \n", "2 100.0 14.0 24.8 \n", "3 100.0 13.0 23.4 \n", "4 100.0 14.0 24.1 \n", ".. ... ... ... \n", "115 89.0 5.2 13.0 \n", "116 88.0 3.4 8.6 \n", "117 87.0 2.5 4.3 \n", "118 91.0 2.6 4.3 \n", "119 96.0 2.5 4.3 \n", "\n", "[120 rows x 13 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the weather forecast dataframe\n", "weather_forecast_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 📡 Connecting to Hopsworks Feature Store\n", "\n", "We connect to Hopsworks Feature Store so we can access the Feature Groups and upload the new data into the Feature Groups." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected. Call `.close()` to terminate connection gracefully.\n", "\n", "Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040\n", "Connected. Call `.close()` to terminate connection gracefully.\n" ] } ], "source": [ "# Importing the hopsworks module for interacting with the Hopsworks platform\n", "import hopsworks\n", "\n", "# Logging into the Hopsworks project\n", "project = hopsworks.login()\n", "\n", "# Getting the feature store from the project\n", "fs = project.get_feature_store()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Retrieve the feature groups\n", "electricity_fg = fs.get_feature_group(\n", " name=\"electricity_prices\",\n", " version=1,\n", ")\n", "\n", "weather_fg = fs.get_feature_group(\n", " name=\"weather_measurements\",\n", " version=1,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ⬆️ Uploading new data to the Feature Store\n", "Here we upload the new data to the retrieved Feature groups by using the `insert` function." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:06 | Remaining Time: 00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Launching job: electricity_prices_1_offline_fg_materialization\n", "Job started successfully, you can follow the progress at \n", "https://c.app.hopsworks.ai/p/550040/jobs/named/electricity_prices_1_offline_fg_materialization/executions\n" ] }, { "data": { "text/plain": [ "(, None)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inserting the electricity_df into the feature group named electricity_fg\n", "electricity_fg.insert(electricity_df, \n", " write_options={\"wait_for_job\" : False})" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Uploading Dataframe: 100.00% |██████████| Rows 120/120 | Elapsed Time: 00:06 | Remaining Time: 00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Launching job: weather_measurements_1_offline_fg_materialization\n", "Job started successfully, you can follow the progress at \n", "https://c.app.hopsworks.ai/p/550040/jobs/named/weather_measurements_1_offline_fg_materialization/executions\n" ] }, { "data": { "text/plain": [ "(, None)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inserting the weather_df into the feature group named weather_fg\n", "weather_fg.insert(weather_forecast_df, \n", " write_options={\"wait_for_job\" : False})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## ⏭️ **Next:** Part 03: Traning \n", "\n", "Next we will create a feature view and training dataset. Further we will train a model and save it in model registry." ] } ], "metadata": { "kernelspec": { "display_name": "bds-mlops", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }