{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Physics NeMo External Aerodynamics DLI\n",
    "\n",
    "## Notebook 1 - Preprocessing Ahmed body *surface* dataset\n",
    "\n",
    "### Introduction\n",
    "\n",
    "For educational purposes, it's important to use lightweight datasets that are easy to store and manage, especially for users who may not have access to high-performance computing resources. One such dataset is the **Ahmed body surface data**, which includes 3D surface geometry, pressure and wall shear stress data for variations in the Ahmed body geometry and inlet Reynolds number. This dataset is a great choice because it is relatively small in size, yet provides valuable information about aerodynamic simulations. It’s ideal for teaching and experimentation, as it won’t demand excessive storage or computational power. *Note that this dataset was created by the NVIDIA PhysicsNeMo development team and differs from other similar datasets hosted on cloud platforms like AWS.*\n",
    "\n",
    "In this notebook, we will walk through the preprocessing steps required to prepare the **Ahmed body surface dataset** for training with the **DoMINO model**, to predict surface quantities like pressure and wall shear stress. The DoMINO model requires 3D surface geometry in **STL format**. The **STL (Stereolithography)** format is a widely used file format for representing 3D surface geometry in computer-aided design (CAD) applications. It describes the surface of a 3D object using a collection of triangular facets, making it a common format for 3D printing and computational geometry. So, as the first step, we’ll extract the 3D surface geometry from the **VTP files**. These files are commonly used in the **VTK (Visualization Toolkit)** format, which stores surface data as **PolyData**—a structure that represents points, lines, and polygons on the surface.\n",
    "\n",
    "To make the dataset more suitable for machine learning, we will convert the **VTP (VTK PolyData)** format into **NPY (NumPy)** format. This conversion makes the data easier to work with in machine learning workflows, as NumPy arrays are optimized for numerical operations, making computations faster and more efficient. After converting the data into NPY format, it can be stored on disk, where it will be readily accessible for training the model and further analysis.\n",
    "\n",
    "Key aspects of this training:\n",
    "- Understanding the Ahmed body geometry and its aerodynamic characteristics\n",
    "- Processing CFD mesh data for deep learning applications\n",
    "\n",
    "## Table of Contents\n",
    "- [Step 1: Define Experiment Parameters and Dependencies](#step-1-define-experiment-parameters-and-dependencies)\n",
    "  - [Loading Required Libraries](#loading-required-libraries)\n",
    "  - [Dependencies](#dependencies)\n",
    "  - [Experiment Parameters and Variables](#experiment-parameters-and-variables)\n",
    "- [Step 2: Convert VTK to STL Files](#step-2-convert-vtk-to-stl-files)\n",
    "  - [Understanding the Conversion Process](#understanding-the-conversion-process)\n",
    "  - [Key Components and Libraries](#key-components-and-libraries)\n",
    "  - [Important Considerations](#important-considerations)\n",
    "  - [Implementation Overview](#implementation-overview)\n",
    "- [Step 3: Visualizing STL Meshes](#Step-3:-Visualizing-STL-Meshes)\n",
    "- [Step 4: Convert CFD Results to NPY Format](#Step-4:-Convert-CFD-Results-to-NPY-Format)   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step 1: Define Experiment Parameters and Dependencies**\n",
    "\n",
    "The first step in training the DoMINO model on the Ahmed body surface dataset is to set up our experiment environment and define the necessary parameters. This includes specifying paths to our data, configuring training settings, and ensuring all required libraries are available.\n",
    "\n",
    "Key components we need to set up:\n",
    "- Data paths for training and validation sets\n",
    "- Model hyperparameters and training configurations\n",
    "- Visualization settings for results\n",
    "- Required Python libraries for mesh processing and deep learning\n",
    "\n",
    "### Loading Required Libraries\n",
    "\n",
    "Before we proceed with the experiment setup, let's first import all the necessary libraries. These libraries will be used for:\n",
    "- Mesh processing and visualization (`vtk`, `pyvista`)\n",
    "- Data handling and file operations (`pathlib`, `concurrent.futures`)\n",
    "- Progress tracking and visualization (`tqdm`, `matplotlib`)\n",
    "- PyTorch provides data primitives: `torch.utils.data.Dataset` that allow you to use pre-loaded datasets as well as your own data. `Dataset` stores the samples and their corresponding labels.\n",
    "- Important utilities for data processing and training, testing DoMINO (`physicsnemo.utils.domino.utils`)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import random\n",
    "from concurrent.futures import ProcessPoolExecutor\n",
    "from pathlib import Path\n",
    "from typing import Union\n",
    "\n",
    "import numpy as np\n",
    "import pyvista as pv\n",
    "import vtk\n",
    "from stl import mesh\n",
    "from tqdm import tqdm\n",
    "\n",
    "from physicsnemo.utils.domino.utils import *\n",
    "from torch.utils.data import Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Experiment Parameters and Variables\n",
    "\n",
    "In this section, we set up all the essential parameters and variables required for the Ahmed body experiment. \n",
    "**Before proceeding, navigate to the data directory and extract the `ahmed_body_dataset.zip` archive. This file contains several sample `.vtp` files needed to run the scripts in this notebook and others.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Directory and Path Configuration\n",
    "DATA_DIR = Path(\"/data/physicsnemo_ahmed_body_dataset_vv1/dataset\")  # Root directory for dataset\n",
    "\n",
    "# Physical Variables\n",
    "VOLUME_VARS = [\"p\"]  # Volume variables to predict (pressure)\n",
    "SURFACE_VARS = [\"p\", \"wallShearStress\"]  # Surface variables to predict\n",
    "AIR_DENSITY = 1.205  # Air density in kg/m³"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step 2: Convert VTK to STL Files**\n",
    "\n",
    "The second step in our workflow involves converting the CFD simulation data from VTK format to STL format.\n",
    "\n",
    "#### Understanding the Conversion Process\n",
    "\n",
    "The conversion from VTK to STL involves several key steps:\n",
    "1. Reading the VTK PolyData file using specialized readers\n",
    "2. Extracting the surface geometry and mesh data\n",
    "3. Converting the data while preserving topology and surface properties\n",
    "4. Saving the result in binary STL format\n",
    "\n",
    "#### Key Components and Libraries\n",
    "\n",
    "We'll use the following libraries for this conversion:\n",
    "\n",
    "1. **VTK (Visualization Toolkit)**\n",
    "   - `vtk`: Reads VTK PolyData files (.vtp)\n",
    "   - `vtkSTLWriter`: Writes data in STL format\n",
    "   - `vtkPolyData`: Manages surface mesh data structures\n",
    "\n",
    "2. **File System Operations**\n",
    "   - `os.path`: Handles file paths and directory operations\n",
    "   - `pathlib.Path`: Provides modern path handling capabilities\n",
    "\n",
    "#### Important Considerations\n",
    "\n",
    "During the conversion process, we need to ensure:\n",
    "- Surface normal vectors are preserved correctly\n",
    "- Mesh quality and topology are maintained\n",
    "- The output is compatible with the DoMINO model's requirements\n",
    "- Memory is managed efficiently for large datasets\n",
    "\n",
    "#### Implementation Overview\n",
    "\n",
    "The conversion is implemented through two main functions:\n",
    "\n",
    "1. **Environment Setup**\n",
    "```python\n",
    "def setup_environment(data_dir: str):\n",
    "    \"\"\"Sets up the working directory and returns relevant paths.\"\"\"\n",
    "    # Returns paths for dataset, info files, STL files, and surface data\n",
    "```\n",
    "\n",
    "2. **VTK to STL Conversion**\n",
    "```python\n",
    "def convert_vtk_to_stl(vtk_filename: str, stl_filename: str):\n",
    "    \"\"\"Converts a single .vtp file to .stl format.\"\"\"\n",
    "    # Uses vtkXMLPolyDataReader and vtkSTLWriter for conversion\n",
    "```\n",
    "\n",
    "Let's proceed with implementing these functions and performing the conversion:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def setup_environment(data_dir: str):\n",
    "    \"\"\"Sets up the working directory and returns relevant paths.\"\"\"\n",
    "    print(\"=== Environment Setup ===\")\n",
    "    print(f\"Current data directory: {data_dir}\")\n",
    "\n",
    "    dataset_paths = {split: os.path.join(data_dir, split) for split in [\"train\", \"validation\", \"test\"]}\n",
    "    info_paths = {k: os.path.join(data_dir, f\"{k}_info\") for k in dataset_paths}\n",
    "    stl_paths = {k: os.path.join(data_dir, f\"{k}_stl_files\") for k in dataset_paths}\n",
    "    surface_paths = {k: os.path.join(data_dir, f\"{k}_prepared_surface_data\") for k in dataset_paths}\n",
    "\n",
    "    return dataset_paths, info_paths, stl_paths, surface_paths\n",
    "\n",
    "\n",
    "def convert_vtk_to_stl(vtk_filename: str, stl_filename: str):\n",
    "    \"\"\"Converts a single .vtp file to .stl format.\"\"\"\n",
    "    reader = vtk.vtkXMLPolyDataReader()\n",
    "    reader.SetFileName(vtk_filename)\n",
    "    reader.Update()\n",
    "\n",
    "    if not reader.GetOutput():\n",
    "        print(f\"[ERROR] Failed to read {vtk_filename}\")\n",
    "        return\n",
    "\n",
    "    writer = vtk.vtkSTLWriter()\n",
    "    writer.SetFileName(stl_filename)\n",
    "    writer.SetInputConnection(reader.GetOutputPort())\n",
    "    writer.Write()\n",
    "\n",
    "    del reader, writer  # Free memory\n",
    "\n",
    "\n",
    "def process_file(vtp_file: str, output_path: str):\n",
    "    \"\"\"Processes a single .vtp file and saves it as .stl.\"\"\"\n",
    "    output_file = os.path.join(output_path, os.path.basename(vtp_file).replace(\".vtp\", \".stl\"))\n",
    "    convert_vtk_to_stl(vtp_file, output_file)\n",
    "\n",
    "\n",
    "def convert_vtp_to_stl_batch(dataset_paths: dict, stl_paths: dict):\n",
    "    \"\"\"Processes all .vtp files in dataset_paths and saves them in output_paths.\"\"\"\n",
    "    print(\"\\n=== Starting Conversion Process ===\")\n",
    "\n",
    "    for path in stl_paths.values(): os.makedirs(path, exist_ok=True)\n",
    "\n",
    "    for key, dataset_path in dataset_paths.items():\n",
    "        stl_path = stl_paths[key]\n",
    "        vtp_files = [os.path.join(dataset_path, f) for f in os.listdir(dataset_path) if f.endswith('.vtp')]\n",
    "\n",
    "        if not vtp_files:\n",
    "            print(f\"[WARNING] No .vtp files found in {dataset_path}\")\n",
    "            continue\n",
    "\n",
    "        print(f\"\\nProcessing {len(vtp_files)} files from {dataset_path} → {stl_path}...\")\n",
    "\n",
    "        with ProcessPoolExecutor() as executor:\n",
    "            list(tqdm(executor.map(process_file, vtp_files, [stl_path] * len(vtp_files)),\n",
    "                      total=len(vtp_files), desc=f\"Converting {key}\", dynamic_ncols=True))\n",
    "\n",
    "    print(\"=== All Conversions Completed Successfully ===\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets convert the files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_paths, info_paths, stl_paths, surface_paths = setup_environment(DATA_DIR)\n",
    "convert_vtp_to_stl_batch(dataset_paths, stl_paths)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step 3: Visualizing STL Meshes**\n",
    "\n",
    "The third step in our workflow focuses on visualizing the converted STL meshes to verify the success of our conversion process. This step is crucial because:\n",
    "\n",
    "#### Understanding the Visualization Process\n",
    "\n",
    "The visualization of STL meshes involves several key aspects:\n",
    "1. Loading the STL files using appropriate visualization libraries\n",
    "2. Setting up the visualization environment with proper parameters\n",
    "3. Rendering the mesh with appropriate colors and properties\n",
    "4. Providing interactive controls for inspection\n",
    "\n",
    "#### Key Components and Libraries\n",
    "\n",
    "We'll use the following libraries for visualization:\n",
    "\n",
    "1. **PyVista**\n",
    "   - `pv.read()`: Loads STL files\n",
    "   - `pv.Plotter()`: Creates interactive visualization windows\n",
    "   - `pv.wrap()`: Converts VTK objects to PyVista meshes\n",
    "\n",
    "2. **Matplotlib**\n",
    "   - For static 2D visualizations if needed\n",
    "   - For saving visualization outputs\n",
    "\n",
    "#### Important Visualization Parameters\n",
    "\n",
    "During the visualization process, we need to consider:\n",
    "- Mesh surface properties (color, opacity)\n",
    "- Camera position and orientation\n",
    "- Lighting conditions\n",
    "- Interactive controls for rotation and zoom\n",
    "- Quality of the rendered output\n",
    "\n",
    "#### Implementation Overview\n",
    "\n",
    "The visualization is implemented through several key functions:\n",
    "\n",
    "1. **Mesh Loading**\n",
    "```python\n",
    "def load_stl_mesh(file_path: str):\n",
    "    \"\"\"Loads an STL file and returns a PyVista mesh object.\"\"\"\n",
    "    # Uses PyVista to read and process the STL file\n",
    "```\n",
    "\n",
    "2. **Interactive Visualization**\n",
    "```python\n",
    "def plot_stl_comparison(mesh1, mesh2, title1=\"Case 3\", title2=\"Case 9\", volume1=None, volume2=None):\n",
    "    \"\"\"Create a multi-view comparison visualization of two STL meshes.\"\"\"\n",
    "    # Sets up the plotter and displays the meshes\n",
    "```\n",
    "\n",
    "Let's proceed with implementing these functions and visualizing our converted meshes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_stl(stl_path: str):\n",
    "    \"\"\"Load an STL file and return its PyVista mesh.\"\"\"\n",
    "    stl_file = Path(stl_path)\n",
    "    if not stl_file.exists():\n",
    "        print(f\"[ERROR] STL file not found: {stl_path}\")\n",
    "        return None\n",
    "\n",
    "    try:\n",
    "        mesh = pv.read(str(stl_file))\n",
    "        return mesh\n",
    "    except Exception as e:\n",
    "        print(f\"[ERROR] Failed to load STL file: {e}\")\n",
    "        return None\n",
    "\n",
    "def load_stl(stl_path: str):\n",
    "    \"\"\"Load an STL file and return its PyVista mesh.\"\"\"\n",
    "    stl_file = Path(stl_path)\n",
    "    if not stl_file.exists():\n",
    "        print(f\"[ERROR] STL file not found: {stl_path}\")\n",
    "        return None\n",
    "\n",
    "    try:\n",
    "        mesh = pv.read(str(stl_file))\n",
    "        return mesh\n",
    "    except Exception as e:\n",
    "        print(f\"[ERROR] Failed to load STL file: {e}\")\n",
    "        return None\n",
    "\n",
    "def plot_stl_comparison(mesh1, mesh2, title1=\"Case 3\", title2=\"Case 9\"):\n",
    "    \"\"\"Create a multi-view comparison visualization of two STL meshes.\"\"\"\n",
    "    # Start virtual frame buffer\n",
    "    pv.start_xvfb()\n",
    "\n",
    "    # Create plotter with off-screen rendering\n",
    "    pl = pv.Plotter(shape=(2, 4), window_size=[1600, 800], off_screen=True)\n",
    "\n",
    "\n",
    "    # Display parameters for each case with brighter colors\n",
    "    params_case1 = {\n",
    "        \"show_edges\": True,\n",
    "        \"opacity\": 1.0,\n",
    "        \"edge_color\": 'black',\n",
    "        \"line_width\": 0.5,\n",
    "        \"color\": [0.6, 0.8, 1.0],\n",
    "        \"smooth_shading\": True,\n",
    "        \"specular\": 0.4,\n",
    "        \"specular_power\": 10,\n",
    "        \"diffuse\": 0.9,\n",
    "        \"ambient\": 1.0,\n",
    "    }\n",
    "\n",
    "    params_case2 = {\n",
    "        \"show_edges\": True,\n",
    "        \"opacity\": 1.0,\n",
    "        \"edge_color\": 'black',\n",
    "        \"line_width\": 0.5,\n",
    "        \"color\": [0.7, 1.0, 0.7],\n",
    "        \"smooth_shading\": True,\n",
    "        \"specular\": 0.4,\n",
    "        \"specular_power\": 10,\n",
    "        \"diffuse\": 0.9,\n",
    "        \"ambient\": 1.0,\n",
    "    }\n",
    "\n",
    "    def setup_view(pl, mesh, title, view_type, params):\n",
    "        \"\"\"Helper function to set up consistent views\"\"\"\n",
    "        pl.add_mesh(mesh, **params)\n",
    "        pl.add_text(title, position=\"upper_edge\", font_size=10, color='black')\n",
    "        pl.add_axes(line_width=2)\n",
    "\n",
    "        if view_type == \"xy\":\n",
    "            pl.view_xy()\n",
    "        elif view_type == \"xz\":\n",
    "            pl.view_xz()\n",
    "        elif view_type == \"yz\":\n",
    "            pl.view_yz()\n",
    "        else:  # isometric\n",
    "            pl.view_isometric()\n",
    "\n",
    "        pl.reset_camera()\n",
    "        pl.camera.zoom(0.85)\n",
    "        # Add directional lighting\n",
    "\n",
    "    # Plot views for both meshes\n",
    "    views = [(\"xy\", \"Top\"), (\"xz\", \"Front\"), (\"yz\", \"Side\"), (\"isometric\", \"Isometric\")]\n",
    "\n",
    "    # First mesh (top row)\n",
    "    for col, (view_type, view_name) in enumerate(views):\n",
    "        pl.subplot(0, col)\n",
    "        setup_view(pl, mesh1, f\"{title1} - {view_name}\", view_type, params_case1)\n",
    "\n",
    "    # Second mesh (bottom row)\n",
    "    for col, (view_type, view_name) in enumerate(views):\n",
    "        pl.subplot(1, col)\n",
    "        setup_view(pl, mesh2, f\"{title2} - {view_name}\", view_type, params_case2)\n",
    "\n",
    "    pl.background_color = '#f0f0f0'\n",
    "\n",
    "    # Display in notebook\n",
    "    return pl.show(jupyter_backend='static')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets plot two geometries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define STL file paths\n",
    "STL_FILE_1 = DATA_DIR / \"train_stl_files/case102.stl\"\n",
    "STL_FILE_2 = DATA_DIR / \"train_stl_files/case116.stl\"\n",
    "\n",
    "# Load STL files\n",
    "mesh1 = load_stl(STL_FILE_1)\n",
    "mesh2 = load_stl(STL_FILE_2)\n",
    "\n",
    "# Print comparison\n",
    "print(\"\\n=== STL File Comparison ===\\n\")\n",
    "print(f\"File: {STL_FILE_1.name} vs {STL_FILE_2.name}\")\n",
    "print(f\"Number of Faces: {mesh1.n_faces} vs {mesh2.n_faces}\")\n",
    "print(f\"Surface Area: {mesh1.area:.2f} vs {mesh2.area:.2f}\")\n",
    "print(f\"Volume: {mesh1.volume:.3f} vs {mesh2.volume:.3f} m³\")\n",
    "\n",
    "# Create visualization with volumes included\n",
    "plot_stl_comparison(mesh1, mesh2, title1=\"Case 102\", title2=\"Case 116\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Step 4: Convert CFD Results to NPY Format**\n",
    "\n",
    "The fourth step in our workflow focuses on converting CFD simulation results into NumPy (.npy) format for efficient training. As previously mentioned, the advantages of using the NPY format include:\n",
    "- NPY format provides faster data loading during training\n",
    "- Enables efficient memory management for large datasets\n",
    "- Facilitates parallel processing of simulation data\n",
    "- Optimizes data access patterns for deep learning frameworks\n",
    "\n",
    "#### Understanding the Conversion Process\n",
    "\n",
    "The conversion process involves several key aspects:\n",
    "1. Reading CFD simulation data from VTK files\n",
    "2. Extracting surface and volume variables\n",
    "3. Processing mesh geometry and physical quantities\n",
    "4. Normalizing data using appropriate scaling factors\n",
    "5. Saving processed data in NPY format\n",
    "\n",
    "#### Key Components and Libraries\n",
    "\n",
    "We'll use the following libraries for the conversion:\n",
    "\n",
    "1. **VTK and PyVista**\n",
    "   - For reading CFD simulation data\n",
    "   - For processing mesh geometry and surface properties\n",
    "   - For computing cell sizes and normals\n",
    "\n",
    "2. **NumPy**\n",
    "   - For efficient array operations\n",
    "   - For saving data in NPY format\n",
    "\n",
    "3. **Concurrent Processing**\n",
    "   - For parallel processing of multiple files\n",
    "   - For improved conversion speed\n",
    "\n",
    "#### Important Data Processing Parameters\n",
    "\n",
    "During the conversion process, we need to consider:\n",
    "- Surface variables (pressure, wall shear stress)\n",
    "- Mean variables (mean wall shear stress, mean pressure)\n",
    "- Data normalization using inlet velocity and density. **The inlet velocity will be read from the info files**\n",
    "  ```python\n",
    "          with open(info_path, \"r\") as file:\n",
    "            velocity = next(float(line.split(\":\")[1].strip()) for line in file if \"Velocity\" in line)\n",
    "  ```\n",
    "- Mesh geometry preservation\n",
    "- Memory efficiency for large datasets\n",
    "\n",
    "#### Implementation Overview\n",
    "\n",
    "The conversion is implemented through several key components:\n",
    "\n",
    "1. **Dataset Class**\n",
    "```python\n",
    "class OpenFoamAhmedBodySurfaceDataset(Dataset):\n",
    "    \"\"\"Datapipe for converting OpenFOAM dataset to npy.\"\"\"\n",
    "    # Handles data loading and processing\n",
    "```\n",
    "\n",
    "2. **File Processing Functions**\n",
    "```python\n",
    "def process_file(fname: str, fm_data, output_path: str):\n",
    "    \"\"\"Processes a single surface data file.\"\"\"\n",
    "    # Converts individual files to NPY format\n",
    "```\n",
    "\n",
    "3. **Batch Processing**\n",
    "```python\n",
    "def process_surface_data_batch(dataset_paths, info_paths, stl_paths, surface_paths):\n",
    "    \"\"\"Processes all surface data files in parallel.\"\"\"\n",
    "    # Handles parallel processing of multiple files\n",
    "```\n",
    "\n",
    "Let's proceed with implementing these components and converting our CFD results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "class OpenFoamAhmedBodySurfaceDataset(Dataset):\n",
    "    \"\"\"Datapipe for converting OpenFOAM dataset to npy.\"\"\"\n",
    "\n",
    "    def __init__(self, data_path: Union[str, Path], info_path: Union[str, Path], stl_path: Union[str, Path], surface_variables=None, volume_variables=None, device: int = 0):\n",
    "        self.data_path = Path(data_path).expanduser()\n",
    "        self.stl_path = Path(stl_path).expanduser()\n",
    "        self.info_path = Path(info_path).expanduser()\n",
    "        assert self.data_path.exists(), f\"Path {self.data_path} does not exist\"\n",
    "\n",
    "        self.filenames = get_filenames(self.data_path)\n",
    "        random.shuffle(self.filenames)\n",
    "        self.surface_variables = surface_variables or [\"p\", \"wallShearStress\"]\n",
    "        self.volume_variables = volume_variables or [\"UMean\", \"pMean\"]\n",
    "        self.device = device\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.filenames)\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        cfd_filename = self.filenames[idx]\n",
    "        car_dir = self.data_path / cfd_filename\n",
    "\n",
    "        stl_path = self.stl_path / f\"{car_dir.stem}.stl\"\n",
    "        info_path = self.info_path / f\"{car_dir.stem}_info.txt\"\n",
    "\n",
    "        with open(info_path, \"r\") as file:\n",
    "            velocity = next(float(line.split(\":\")[1].strip()) for line in file if \"Velocity\" in line)\n",
    "\n",
    "        mesh_stl = pv.get_reader(stl_path).read()\n",
    "        stl_faces = mesh_stl.faces.reshape(-1, 4)[:, 1:]\n",
    "        stl_sizes = np.array(mesh_stl.compute_cell_sizes(length=False, area=True, volume=False).cell_data[\"Area\"])\n",
    "\n",
    "        reader = vtk.vtkXMLPolyDataReader()\n",
    "        reader.SetFileName(str(car_dir))\n",
    "        reader.Update()\n",
    "        polydata = reader.GetOutput()\n",
    "\n",
    "        celldata = get_node_to_elem(polydata).GetCellData()\n",
    "        surface_fields = np.concatenate(get_fields(celldata, self.surface_variables), axis=-1) / (AIR_DENSITY * velocity**2)\n",
    "\n",
    "        mesh = pv.PolyData(polydata)\n",
    "        surface_sizes = np.array(mesh.compute_cell_sizes(length=False, area=True, volume=False).cell_data[\"Area\"])\n",
    "        surface_normals = mesh.cell_normals / np.linalg.norm(mesh.cell_normals, axis=1)[:, np.newaxis]\n",
    "\n",
    "        return {\n",
    "            \"stl_coordinates\": mesh_stl.points.astype(np.float32),\n",
    "            \"stl_centers\": mesh_stl.cell_centers().points.astype(np.float32),\n",
    "            \"stl_faces\": stl_faces.flatten().astype(np.float32),\n",
    "            \"stl_areas\": stl_sizes.astype(np.float32),\n",
    "            \"surface_mesh_centers\": mesh.cell_centers().points.astype(np.float32),\n",
    "            \"surface_normals\": surface_normals.astype(np.float32),\n",
    "            \"surface_areas\": surface_sizes.astype(np.float32),\n",
    "            \"volume_fields\": None,\n",
    "            \"volume_mesh_centers\": None,\n",
    "            \"surface_fields\": surface_fields.astype(np.float32),\n",
    "            \"filename\": cfd_filename,\n",
    "            \"stream_velocity\": velocity,\n",
    "            \"air_density\": AIR_DENSITY,\n",
    "        }\n",
    "\n",
    "def process_file(fname: str, fm_data, output_path: str):\n",
    "    \"\"\"Processes a single surface data file.\"\"\"\n",
    "    full_path, output_file = os.path.join(fm_data.data_path, fname), os.path.join(output_path, f\"{fname}.npy\")\n",
    "    if os.path.exists(output_file) or not os.path.exists(full_path) or os.path.getsize(full_path) == 0:\n",
    "        return\n",
    "    np.save(output_file, fm_data[fm_data.filenames.index(fname)])\n",
    "\n",
    "def process_surface_data_batch(dataset_paths: dict, info_paths: dict, stl_paths: dict, surface_paths: dict):\n",
    "    \"\"\"Processes all surface data files in dataset_paths and saves them in output_paths.\"\"\"\n",
    "\n",
    "    for path in surface_paths.values(): os.makedirs(path, exist_ok=True)\n",
    "\n",
    "    print(\"=== Starting Processing ===\")\n",
    "    for key, dataset_path in dataset_paths.items():\n",
    "        surface_path = surface_paths[key]\n",
    "        os.makedirs(surface_path, exist_ok=True)\n",
    "        fm_data = OpenFoamAhmedBodySurfaceDataset(dataset_path, info_paths[key], stl_paths[key], VOLUME_VARS, SURFACE_VARS)\n",
    "        file_list = [fname for fname in fm_data.filenames if fname.endswith(\".vtp\")]\n",
    "\n",
    "        print(f\"\\nProcessing {len(file_list)} files from {dataset_path} → {surface_path}...\")\n",
    "\n",
    "        with ProcessPoolExecutor() as executor:\n",
    "            list(tqdm(executor.map(process_file, file_list, [fm_data]*len(file_list), [surface_path]*len(file_list)),\n",
    "                      total=len(file_list), desc=f\"Processing {key}\", dynamic_ncols=True))\n",
    "\n",
    "    print(\"=== All Processing Completed Successfully ===\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets convert the files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "process_surface_data_batch(dataset_paths, info_paths, stl_paths, surface_paths)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The files are converted to NPY format and saved in the surface_paths directory. We can now use this data to train a model, which is covered in the next notebook - [domino-training-test.ipynb](domino-training-test.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os._exit(00)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}