{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **Msc. BDS Module - Data Engineering and Machine Learning Operations in Business (MLOPs)** - Part 02: Feature Pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🗒️ The notebook is divided into the following sections:\n",
"1. Parsing new data.\n",
"2. Inserting the new data into the Feature Store."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⚙️ Import of libraries and packages\n",
"\n",
"We start by accessing the folder we have created that holds the functions (incl. live API calls and data preprocessing) we need for electricity prices and weather measures. Then, we proceed to import some of the necessary libraries needed for this notebook and warnings to avoid unnecessary distractions and keep output clean."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-\n",
"/Users/camillahannesbo/Documents/AAU/Master - BDS/2. semester/Data Engineering and Machine learning operations in Business/MLOPs-Assignment-/notebooks\n"
]
}
],
"source": [
"# First we go one back in our directory to access the folder with our functions\n",
"%cd ..\n",
"\n",
"# Now we import the functions from the features folder\n",
"# This is the functions we have created to generate features for electricity prices and weather measures\n",
"from features import electricity_prices, weather_measures\n",
"\n",
"# We go back into the notebooks folder\n",
"%cd notebooks"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Importing pandas for data handling\n",
"import pandas as pd\n",
"\n",
"# Ignore warnings\n",
"import warnings \n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🪄 Parsing New Data\n",
"To fetch non-historical electricity prices we are setting `historical` to `False`. \n",
"\n",
"In order to provide real time weather measures, a weather forecast measure for the next 5 days is being fetched.\n",
"\n",
"There are of course no changes to the calendar data, and therefore no new data is retrieved from it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 💸 Electricity Prices per day from Energinet"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Fetching non-historical electricity prices for area DK1\n",
"electricity_df = electricity_prices.electricity_prices(\n",
" historical=False,\n",
" area=[\"DK1\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" timestamp | \n",
" datetime | \n",
" date | \n",
" hour | \n",
" dk1_spotpricedkk_kwh | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1714953600000 | \n",
" 2024-05-06 00:00:00 | \n",
" 2024-05-06 | \n",
" 0 | \n",
" 0.61803 | \n",
"
\n",
" \n",
" 1 | \n",
" 1714957200000 | \n",
" 2024-05-06 01:00:00 | \n",
" 2024-05-06 | \n",
" 1 | \n",
" 0.59364 | \n",
"
\n",
" \n",
" 2 | \n",
" 1714960800000 | \n",
" 2024-05-06 02:00:00 | \n",
" 2024-05-06 | \n",
" 2 | \n",
" 0.59975 | \n",
"
\n",
" \n",
" 3 | \n",
" 1714964400000 | \n",
" 2024-05-06 03:00:00 | \n",
" 2024-05-06 | \n",
" 3 | \n",
" 0.59632 | \n",
"
\n",
" \n",
" 4 | \n",
" 1714968000000 | \n",
" 2024-05-06 04:00:00 | \n",
" 2024-05-06 | \n",
" 4 | \n",
" 0.60930 | \n",
"
\n",
" \n",
" 5 | \n",
" 1714971600000 | \n",
" 2024-05-06 05:00:00 | \n",
" 2024-05-06 | \n",
" 5 | \n",
" 0.65271 | \n",
"
\n",
" \n",
" 6 | \n",
" 1714975200000 | \n",
" 2024-05-06 06:00:00 | \n",
" 2024-05-06 | \n",
" 6 | \n",
" 0.79875 | \n",
"
\n",
" \n",
" 7 | \n",
" 1714978800000 | \n",
" 2024-05-06 07:00:00 | \n",
" 2024-05-06 | \n",
" 7 | \n",
" 0.97157 | \n",
"
\n",
" \n",
" 8 | \n",
" 1714982400000 | \n",
" 2024-05-06 08:00:00 | \n",
" 2024-05-06 | \n",
" 8 | \n",
" 0.74930 | \n",
"
\n",
" \n",
" 9 | \n",
" 1714986000000 | \n",
" 2024-05-06 09:00:00 | \n",
" 2024-05-06 | \n",
" 9 | \n",
" 0.66383 | \n",
"
\n",
" \n",
" 10 | \n",
" 1714989600000 | \n",
" 2024-05-06 10:00:00 | \n",
" 2024-05-06 | \n",
" 10 | \n",
" 0.60416 | \n",
"
\n",
" \n",
" 11 | \n",
" 1714993200000 | \n",
" 2024-05-06 11:00:00 | \n",
" 2024-05-06 | \n",
" 11 | \n",
" 0.47937 | \n",
"
\n",
" \n",
" 12 | \n",
" 1714996800000 | \n",
" 2024-05-06 12:00:00 | \n",
" 2024-05-06 | \n",
" 12 | \n",
" 0.56880 | \n",
"
\n",
" \n",
" 13 | \n",
" 1715000400000 | \n",
" 2024-05-06 13:00:00 | \n",
" 2024-05-06 | \n",
" 13 | \n",
" 0.55724 | \n",
"
\n",
" \n",
" 14 | \n",
" 1715004000000 | \n",
" 2024-05-06 14:00:00 | \n",
" 2024-05-06 | \n",
" 14 | \n",
" 0.55858 | \n",
"
\n",
" \n",
" 15 | \n",
" 1715007600000 | \n",
" 2024-05-06 15:00:00 | \n",
" 2024-05-06 | \n",
" 15 | \n",
" 0.60498 | \n",
"
\n",
" \n",
" 16 | \n",
" 1715011200000 | \n",
" 2024-05-06 16:00:00 | \n",
" 2024-05-06 | \n",
" 16 | \n",
" 0.60878 | \n",
"
\n",
" \n",
" 17 | \n",
" 1715014800000 | \n",
" 2024-05-06 17:00:00 | \n",
" 2024-05-06 | \n",
" 17 | \n",
" 0.69635 | \n",
"
\n",
" \n",
" 18 | \n",
" 1715018400000 | \n",
" 2024-05-06 18:00:00 | \n",
" 2024-05-06 | \n",
" 18 | \n",
" 0.80830 | \n",
"
\n",
" \n",
" 19 | \n",
" 1715022000000 | \n",
" 2024-05-06 19:00:00 | \n",
" 2024-05-06 | \n",
" 19 | \n",
" 1.01923 | \n",
"
\n",
" \n",
" 20 | \n",
" 1715025600000 | \n",
" 2024-05-06 20:00:00 | \n",
" 2024-05-06 | \n",
" 20 | \n",
" 1.07398 | \n",
"
\n",
" \n",
" 21 | \n",
" 1715029200000 | \n",
" 2024-05-06 21:00:00 | \n",
" 2024-05-06 | \n",
" 21 | \n",
" 0.80539 | \n",
"
\n",
" \n",
" 22 | \n",
" 1715032800000 | \n",
" 2024-05-06 22:00:00 | \n",
" 2024-05-06 | \n",
" 22 | \n",
" 0.70544 | \n",
"
\n",
" \n",
" 23 | \n",
" 1715036400000 | \n",
" 2024-05-06 23:00:00 | \n",
" 2024-05-06 | \n",
" 23 | \n",
" 0.62966 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" timestamp datetime date hour dk1_spotpricedkk_kwh\n",
"0 1714953600000 2024-05-06 00:00:00 2024-05-06 0 0.61803\n",
"1 1714957200000 2024-05-06 01:00:00 2024-05-06 1 0.59364\n",
"2 1714960800000 2024-05-06 02:00:00 2024-05-06 2 0.59975\n",
"3 1714964400000 2024-05-06 03:00:00 2024-05-06 3 0.59632\n",
"4 1714968000000 2024-05-06 04:00:00 2024-05-06 4 0.60930\n",
"5 1714971600000 2024-05-06 05:00:00 2024-05-06 5 0.65271\n",
"6 1714975200000 2024-05-06 06:00:00 2024-05-06 6 0.79875\n",
"7 1714978800000 2024-05-06 07:00:00 2024-05-06 7 0.97157\n",
"8 1714982400000 2024-05-06 08:00:00 2024-05-06 8 0.74930\n",
"9 1714986000000 2024-05-06 09:00:00 2024-05-06 9 0.66383\n",
"10 1714989600000 2024-05-06 10:00:00 2024-05-06 10 0.60416\n",
"11 1714993200000 2024-05-06 11:00:00 2024-05-06 11 0.47937\n",
"12 1714996800000 2024-05-06 12:00:00 2024-05-06 12 0.56880\n",
"13 1715000400000 2024-05-06 13:00:00 2024-05-06 13 0.55724\n",
"14 1715004000000 2024-05-06 14:00:00 2024-05-06 14 0.55858\n",
"15 1715007600000 2024-05-06 15:00:00 2024-05-06 15 0.60498\n",
"16 1715011200000 2024-05-06 16:00:00 2024-05-06 16 0.60878\n",
"17 1715014800000 2024-05-06 17:00:00 2024-05-06 17 0.69635\n",
"18 1715018400000 2024-05-06 18:00:00 2024-05-06 18 0.80830\n",
"19 1715022000000 2024-05-06 19:00:00 2024-05-06 19 1.01923\n",
"20 1715025600000 2024-05-06 20:00:00 2024-05-06 20 1.07398\n",
"21 1715029200000 2024-05-06 21:00:00 2024-05-06 21 0.80539\n",
"22 1715032800000 2024-05-06 22:00:00 2024-05-06 22 0.70544\n",
"23 1715036400000 2024-05-06 23:00:00 2024-05-06 23 0.62966"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display the electricity dataframe\n",
"electricity_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🌈 Forecast Weather Measures from Open Meteo"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Fetching weather forecast measures for the next 5 days\n",
"weather_forecast_df = weather_measures.forecast_weather_measures(\n",
" forecast_length=5\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" timestamp | \n",
" datetime | \n",
" date | \n",
" hour | \n",
" temperature_2m | \n",
" relative_humidity_2m | \n",
" precipitation | \n",
" rain | \n",
" snowfall | \n",
" weather_code | \n",
" cloud_cover | \n",
" wind_speed_10m | \n",
" wind_gusts_10m | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1714953600000 | \n",
" 2024-05-06 00:00:00 | \n",
" 2024-05-06 | \n",
" 0 | \n",
" 9.6 | \n",
" 93.0 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.0 | \n",
" 51.0 | \n",
" 100.0 | \n",
" 14.4 | \n",
" 24.8 | \n",
"
\n",
" \n",
" 1 | \n",
" 1714957200000 | \n",
" 2024-05-06 01:00:00 | \n",
" 2024-05-06 | \n",
" 1 | \n",
" 9.7 | \n",
" 93.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 100.0 | \n",
" 14.0 | \n",
" 24.8 | \n",
"
\n",
" \n",
" 2 | \n",
" 1714960800000 | \n",
" 2024-05-06 02:00:00 | \n",
" 2024-05-06 | \n",
" 2 | \n",
" 9.5 | \n",
" 91.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 100.0 | \n",
" 14.0 | \n",
" 24.8 | \n",
"
\n",
" \n",
" 3 | \n",
" 1714964400000 | \n",
" 2024-05-06 03:00:00 | \n",
" 2024-05-06 | \n",
" 3 | \n",
" 9.5 | \n",
" 91.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 100.0 | \n",
" 13.0 | \n",
" 23.4 | \n",
"
\n",
" \n",
" 4 | \n",
" 1714968000000 | \n",
" 2024-05-06 04:00:00 | \n",
" 2024-05-06 | \n",
" 4 | \n",
" 9.6 | \n",
" 92.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 100.0 | \n",
" 14.0 | \n",
" 24.1 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 115 | \n",
" 1715367600000 | \n",
" 2024-05-10 19:00:00 | \n",
" 2024-05-10 | \n",
" 19 | \n",
" 11.5 | \n",
" 68.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 89.0 | \n",
" 5.2 | \n",
" 13.0 | \n",
"
\n",
" \n",
" 116 | \n",
" 1715371200000 | \n",
" 2024-05-10 20:00:00 | \n",
" 2024-05-10 | \n",
" 20 | \n",
" 10.5 | \n",
" 71.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 88.0 | \n",
" 3.4 | \n",
" 8.6 | \n",
"
\n",
" \n",
" 117 | \n",
" 1715374800000 | \n",
" 2024-05-10 21:00:00 | \n",
" 2024-05-10 | \n",
" 21 | \n",
" 9.5 | \n",
" 74.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 87.0 | \n",
" 2.5 | \n",
" 4.3 | \n",
"
\n",
" \n",
" 118 | \n",
" 1715378400000 | \n",
" 2024-05-10 22:00:00 | \n",
" 2024-05-10 | \n",
" 22 | \n",
" 8.6 | \n",
" 78.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 91.0 | \n",
" 2.6 | \n",
" 4.3 | \n",
"
\n",
" \n",
" 119 | \n",
" 1715382000000 | \n",
" 2024-05-10 23:00:00 | \n",
" 2024-05-10 | \n",
" 23 | \n",
" 7.8 | \n",
" 81.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 3.0 | \n",
" 96.0 | \n",
" 2.5 | \n",
" 4.3 | \n",
"
\n",
" \n",
"
\n",
"
120 rows × 13 columns
\n",
"
"
],
"text/plain": [
" timestamp datetime date hour temperature_2m \\\n",
"0 1714953600000 2024-05-06 00:00:00 2024-05-06 0 9.6 \n",
"1 1714957200000 2024-05-06 01:00:00 2024-05-06 1 9.7 \n",
"2 1714960800000 2024-05-06 02:00:00 2024-05-06 2 9.5 \n",
"3 1714964400000 2024-05-06 03:00:00 2024-05-06 3 9.5 \n",
"4 1714968000000 2024-05-06 04:00:00 2024-05-06 4 9.6 \n",
".. ... ... ... ... ... \n",
"115 1715367600000 2024-05-10 19:00:00 2024-05-10 19 11.5 \n",
"116 1715371200000 2024-05-10 20:00:00 2024-05-10 20 10.5 \n",
"117 1715374800000 2024-05-10 21:00:00 2024-05-10 21 9.5 \n",
"118 1715378400000 2024-05-10 22:00:00 2024-05-10 22 8.6 \n",
"119 1715382000000 2024-05-10 23:00:00 2024-05-10 23 7.8 \n",
"\n",
" relative_humidity_2m precipitation rain snowfall weather_code \\\n",
"0 93.0 0.2 0.2 0.0 51.0 \n",
"1 93.0 0.0 0.0 0.0 3.0 \n",
"2 91.0 0.0 0.0 0.0 3.0 \n",
"3 91.0 0.0 0.0 0.0 3.0 \n",
"4 92.0 0.0 0.0 0.0 3.0 \n",
".. ... ... ... ... ... \n",
"115 68.0 0.0 0.0 0.0 3.0 \n",
"116 71.0 0.0 0.0 0.0 3.0 \n",
"117 74.0 0.0 0.0 0.0 3.0 \n",
"118 78.0 0.0 0.0 0.0 3.0 \n",
"119 81.0 0.0 0.0 0.0 3.0 \n",
"\n",
" cloud_cover wind_speed_10m wind_gusts_10m \n",
"0 100.0 14.4 24.8 \n",
"1 100.0 14.0 24.8 \n",
"2 100.0 14.0 24.8 \n",
"3 100.0 13.0 23.4 \n",
"4 100.0 14.0 24.1 \n",
".. ... ... ... \n",
"115 89.0 5.2 13.0 \n",
"116 88.0 3.4 8.6 \n",
"117 87.0 2.5 4.3 \n",
"118 91.0 2.6 4.3 \n",
"119 96.0 2.5 4.3 \n",
"\n",
"[120 rows x 13 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display the weather forecast dataframe\n",
"weather_forecast_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 📡 Connecting to Hopsworks Feature Store\n",
"\n",
"We connect to Hopsworks Feature Store so we can access the Feature Groups and upload the new data into the Feature Groups."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connected. Call `.close()` to terminate connection gracefully.\n",
"\n",
"Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040\n",
"Connected. Call `.close()` to terminate connection gracefully.\n"
]
}
],
"source": [
"# Importing the hopsworks module for interacting with the Hopsworks platform\n",
"import hopsworks\n",
"\n",
"# Logging into the Hopsworks project\n",
"project = hopsworks.login()\n",
"\n",
"# Getting the feature store from the project\n",
"fs = project.get_feature_store()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Retrieve the feature groups\n",
"electricity_fg = fs.get_feature_group(\n",
" name=\"electricity_prices\",\n",
" version=1,\n",
")\n",
"\n",
"weather_fg = fs.get_feature_group(\n",
" name=\"weather_measurements\",\n",
" version=1,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ⬆️ Uploading new data to the Feature Store\n",
"Here we upload the new data to the retrieved Feature groups by using the `insert` function."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Uploading Dataframe: 100.00% |██████████| Rows 24/24 | Elapsed Time: 00:06 | Remaining Time: 00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Launching job: electricity_prices_1_offline_fg_materialization\n",
"Job started successfully, you can follow the progress at \n",
"https://c.app.hopsworks.ai/p/550040/jobs/named/electricity_prices_1_offline_fg_materialization/executions\n"
]
},
{
"data": {
"text/plain": [
"(, None)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Inserting the electricity_df into the feature group named electricity_fg\n",
"electricity_fg.insert(electricity_df, \n",
" write_options={\"wait_for_job\" : False})"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Uploading Dataframe: 100.00% |██████████| Rows 120/120 | Elapsed Time: 00:06 | Remaining Time: 00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Launching job: weather_measurements_1_offline_fg_materialization\n",
"Job started successfully, you can follow the progress at \n",
"https://c.app.hopsworks.ai/p/550040/jobs/named/weather_measurements_1_offline_fg_materialization/executions\n"
]
},
{
"data": {
"text/plain": [
"(, None)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Inserting the weather_df into the feature group named weather_fg\n",
"weather_fg.insert(weather_forecast_df, \n",
" write_options={\"wait_for_job\" : False})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## ⏭️ **Next:** Part 03: Traning \n",
"\n",
"Next we will create a feature view and training dataset. Further we will train a model and save it in model registry."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "bds-mlops",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}