hetfit / main.md
apsys's picture
ic
c176aea

A newer version of the Streamlit SDK is available: 1.36.0

Upgrade

Table of Contents

main

PINN

PINN.pinns

PINNd_p Objects

class PINNd_p(nn.Module)

$d \mapsto P$

forward

def forward(x)

$P,U$ input, $d$ output

Arguments:

  • x type - description

Returns:

  • _type_ - description

PINNhd_ma Objects

class PINNhd_ma(nn.Module)

$h,d \mapsto m_a $

PINNT_ma Objects

class PINNT_ma(nn.Module)

$ m_a, U \mapsto T$

utils

utils.test

utils.dataset_loader

get_dataset

def get_dataset(raw: bool = False,
                sample_size: int = 1000,
                name: str = 'dataset.pkl',
                source: str = 'dataset.csv',
                boundary_conditions: list = None) -> _pickle

Gets augmented dataset

Arguments:

  • raw bool, optional - either to use source data or augmented. Defaults to False.
  • sample_size int, optional - sample size. Defaults to 1000.
  • name str, optional - name of wanted dataset. Defaults to 'dataset.pkl'.
  • boundary_conditions list,optional - y1,y2,x1,x2.

Returns:

  • _pickle - pickle buffer

utils.ndgan

DCGAN Objects

class DCGAN()

__init__

def __init__(latent, data)

The function takes in two arguments, the latent space dimension and the dataframe. It then sets

the latent space dimension, the dataframe, the number of inputs and outputs, and then builds the models

Arguments:

  • latent: The number of dimensions in the latent space
  • data: This is the dataframe that contains the data that we want to generate

define_discriminator

def define_discriminator(inputs=8)

The discriminator is a neural network that takes in a vector of length 8 and outputs a single

value between 0 and 1

Arguments:

  • inputs: number of features in the dataset, defaults to 8 (optional)

Returns:

The model is being returned.

define_generator

def define_generator(latent_dim, outputs=8)

The function takes in a latent dimension and outputs and returns a model with two hidden layers

and an output layer

Arguments:

  • latent_dim: The dimension of the latent space, or the space that the generator will map to
  • outputs: the number of outputs of the generator, defaults to 8 (optional)

Returns:

The model is being returned.

build_models

def build_models()

The function returns the generator and discriminator models

Returns:

The generator and discriminator models are being returned.

generate_latent_points

def generate_latent_points(latent_dim, n)

Generate random points in latent space as input for the generator

Arguments:

  • latent_dim: the dimension of the latent space, which is the input to the generator
  • n: number of images to generate

Returns:

A numpy array of random numbers.

generate_fake_samples

def generate_fake_samples(generator, latent_dim, n)

It generates a batch of fake samples with class labels

Arguments:

  • generator: The generator model that we will train
  • latent_dim: The dimension of the latent space, e.g. 100
  • n: The number of samples to generate

Returns:

x is the generated images and y is the labels for the generated images.

define_gan

def define_gan(generator, discriminator)

The function takes in a generator and a discriminator, sets the discriminator to be untrainable,

and then adds the generator and discriminator to a sequential model. The sequential model is then compiled with an optimizer and a loss function.

The optimizer is adam, which is a type of gradient descent algorithm.

Loss function is binary crossentropy, which is a loss function that is used for binary classification problems.

The function then returns the GAN.

Arguments:

  • generator: The generator model
  • discriminator: The discriminator model that takes in a dataset and outputs a single value representing fake/real

Returns:

The model is being returned.

summarize_performance

def summarize_performance(epoch, generator, discriminator, latent_dim, n=200)

This function evaluates the discriminator on real and fake data, and plots the real and fake

data

Arguments:

  • epoch: the number of epochs to train for
  • generator: the generator model
  • discriminator: the discriminator model
  • latent_dim: The dimension of the latent space
  • n: number of samples to generate, defaults to 200 (optional)

train_gan

def train_gan(g_model,
              d_model,
              gan_model,
              latent_dim,
              num_epochs=2500,
              num_eval=2500,
              batch_size=2)

Arguments:

  • g_model: the generator model
  • d_model: The discriminator model
  • gan_model: The GAN model, which is the generator model combined with the discriminator model
  • latent_dim: The dimension of the latent space. This is the number of random numbers that the generator model will take as input
  • num_epochs: The number of epochs to train for, defaults to 2500 (optional)
  • num_eval: number of epochs to run before evaluating the model, defaults to 2500 (optional)
  • batch_size: The number of samples to use for each gradient update, defaults to 2 (optional)

start_training

def start_training()

The function takes the generator, discriminator, and gan models, and the latent vector as arguments, and then calls the train_gan function.

predict

def predict(n)

It takes the generator model and the latent space as input and returns a batch of fake samples

Arguments:

  • n: the number of samples to generate

Returns:

the generated fake samples.

utils.data_augmentation

dataset Objects

class dataset()

Creates dataset from input source

__init__

def __init__(number_samples: int,
             name: str,
             source: str,
             boundary_conditions: list = None)

Arguments:

  • number_samples int - number of samples to be genarated
  • name str - name of dataset
  • source str - source file
  • boundary_conditions list - y1,y2,x1,x2

generate

def generate()

The function takes in a dataframe, normalizes it, and then trains a DCGAN on it.

The DCGAN is a type of generative adversarial network (GAN) that is used to generate new data.

The DCGAN is trained on the normalized dataframe, and then the DCGAN is used to generate new data.

The new data is then concatenated with the original dataframe, and the new dataframe is saved as a pickle file.

The new dataframe is then returned.

Returns:

The dataframe is being returned.

:orange[nets]

nets.envs

SCI Objects

class SCI()

Scaled computing interface.

Arguments:

  • hidden_dim int, optional - Max demension of hidden linear layer. Defaults to 200. Should be >80 in not 1d case
  • dropout bool, optional - LEGACY, don't use. Defaults to True.
  • epochs int, optional - Optionally specify epochs here, but better in train. Defaults to 10.
  • dataset str, optional - dataset to be selected from ./data. Defaults to 'test.pkl'. If name not exists, code will generate new dataset with upcoming parameters.
  • sample_size int, optional - Samples to be generated (note: BEFORE applying boundary conditions). Defaults to 1000.
  • source str, optional - Source from which data will be generated. Better to not change. Defaults to 'dataset.csv'.
  • boundary_conditions list, optional - If sepcified, whole dataset will be cut rectangulary. Input list is [ymin,ymax,xmin,xmax] type. Defaults to None.

__init__

def __init__(hidden_dim: int = 200,
             dropout: bool = True,
             epochs: int = 10,
             dataset: str = 'test.pkl',
             sample_size: int = 1000,
             source: str = 'dataset.csv',
             boundary_conditions: list = None,
             batch_size: int = 20)

Arguments:

  • hidden_dim int, optional - Max demension of hidden linear layer. Defaults to 200. Should be >80 in not 1d case
  • dropout bool, optional - LEGACY, don't use. Defaults to True.
  • epochs int, optional - Optionally specify epochs here, but better in train. Defaults to 10.
  • dataset str, optional - dataset to be selected from ./data. Defaults to 'test.pkl'. If name not exists, code will generate new dataset with upcoming parameters.
  • sample_size int, optional - Samples to be generated (note: BEFORE applying boundary conditions). Defaults to 1000.
  • source str, optional - Source from which data will be generated. Better to not change. Defaults to 'dataset.csv'.
  • boundary_conditions list, optional - If sepcified, whole dataset will be cut rectangulary. Input list is [ymin,ymax,xmin,xmax] type. Defaults to None.
  • batch_size int, optional - Batch size for training.

feature_gen

def feature_gen(base: bool = True,
                fname: str = None,
                index: int = None,
                func=None) -> None

Generate new features. If base true, generates most obvious ones. You can customize this by adding new feature as name of column - fname, index of parent column, and lambda function which needs to be applied elementwise.

Arguments:

  • base bool, optional - Defaults to True.
  • fname str, optional - Name of new column. Defaults to None.
  • index int, optional - Index of parent column. Defaults to None.
  • func type, optional - lambda function. Defaults to None.

feature_importance

def feature_importance(X: pd.DataFrame, Y: pd.Series, verbose: int = 1)

Gets feature importance by SGD regression and score selection. Default threshold is 1.25*mean input X as self.df.iloc[:,(columns of choice)] Y as self.df.iloc[:,(column of choice)]

Arguments:

  • X pd.DataFrame - Builtin DataFrame
  • Y pd.Series - Builtin Series
  • verbose int, optional - either to or to not print actual report. Defaults to 1.

Returns:

Report (str)

data_flow

def data_flow(columns_idx: tuple = (1, 3, 3, 5),
              idx: tuple = None,
              split_idx: int = 800) -> torch.utils.data.DataLoader

Data prep pipeline It is called automatically, don't call it in your code.

Arguments:

  • columns_idx tuple, optional - Columns to be selected (sliced 1:2 3:4) for feature fitting. Defaults to (1,3,3,5).
  • idx tuple, optional - 2|3 indexes to be selected for feature fitting. Defaults to None. Use either idx or columns_idx (for F:R->R idx, for F:R->R2 columns_idx) split_idx (int) : Index to split for training

Returns:

  • torch.utils.data.DataLoader - Torch native dataloader

init_seed

def init_seed(seed)

Initializes seed for torch - optional

train_epoch

def train_epoch(X, model, loss_function, optim)

Inner function of class - don't use.

We iterate through the data, calculate the loss, backpropagate, and update the weights

Arguments:

  • X: the training data
  • model: the model we're training
  • loss_function: the loss function to use
  • optim: the optimizer, which is the algorithm that will update the weights of the model

compile

def compile(columns: tuple = None,
            idx: tuple = None,
            optim: torch.optim = torch.optim.AdamW,
            loss: nn = nn.L1Loss,
            model: nn.Module = dmodel,
            custom: bool = False,
            lr: float = 0.0001) -> None

Builds model, loss, optimizer. Has defaults

Arguments:

  • columns tuple, optional - Columns to be selected for feature fitting. Defaults to (1,3,3,5).
  • optim - torch Optimizer. Default AdamW
  • loss - torch Loss function (nn). Defaults to L1Loss

train

def train(epochs: int = 10) -> None

Train model

  • If sklearn instance uses .fit()

  • epochs (int,optional)

save

def save(name: str = 'model.pt') -> None

This function saves the model to a file

Arguments:

  • name (str (optional)): The name of the file to save the model to, defaults to model.pt

onnx_export

def onnx_export(path: str = './models/model.onnx')

We are exporting the model to the ONNX format, using the input data and the model itself

Arguments:

  • path (str (optional)): The path to save the model to, defaults to ./models/model.onnx

jit_export

def jit_export(path: str = './models/model.pt')

Exports properly defined model to jit

Arguments:

  • path str, optional - path to models. Defaults to './models/model.pt'.

inference

def inference(X: tensor, model_name: str = None) -> np.ndarray

Inference of (pre-)trained model

Arguments:

  • X tensor - your data in domain of train

Returns:

  • np.ndarray - predictions

plot

def plot()

If the input and output dimensions are the same, plot the input and output as a scatter plot. If the input and output dimensions are different, plot the first dimension of the input and output as a scatter plot

plot3d

def plot3d(colX=0, colY=1)

Plot of inputs and predicted data in mesh format

Returns:

plotly plot

performance

def performance(c=0.4) -> dict

Automatic APE based performance if applicable, else returns nan

Arguments:

  • c float, optional - ZDE mitigation constant. Defaults to 0.4.

Returns:

  • dict - {'Generator_Accuracy, %':np.mean(a),'APE_abs, %':abs_ape,'Model_APE, %': ape}

performance_super

def performance_super(c=0.4,
                      real_data_column_index: tuple = (1, 8),
                      real_data_samples: int = 23,
                      generated_length: int = 1000) -> dict

Performance by custom parameters. APE loss

Arguments:

  • c float, optional - ZDE mitigation constant. Defaults to 0.4.
  • real_data_column_index tuple, optional - Defaults to (1,8).
  • real_data_samples int, optional - Defaults to 23.
  • generated_length int, optional - Defaults to 1000.

Returns:

  • dict - {'Generator_Accuracy, %':np.mean(a),'APE_abs, %':abs_ape,'Model_APE, %': ape}

RCI Objects

class RCI(SCI)

Real values interface, uses different types of NN, NO scaling. Parent: SCI()

data_flow

def data_flow(columns_idx: tuple = (1, 3, 3, 5),
              idx: tuple = None,
              split_idx: int = 800) -> torch.utils.data.DataLoader

Data prep pipeline

Arguments:

  • columns_idx tuple, optional - Columns to be selected (sliced 1:2 3:4) for feature fitting. Defaults to (1,3,3,5).
  • idx tuple, optional - 2|3 indexes to be selected for feature fitting. Defaults to None. Use either idx or columns_idx (for F:R->R idx, for F:R->R2 columns_idx) split_idx (int) : Index to split for training

Returns:

  • torch.utils.data.DataLoader - Torch native dataloader

compile

def compile(columns: tuple = None,
            idx: tuple = (3, 1),
            optim: torch.optim = torch.optim.AdamW,
            loss: nn = nn.L1Loss,
            model: nn.Module = PINNd_p,
            lr: float = 0.001) -> None

Builds model, loss, optimizer. Has defaults

Arguments:

  • columns tuple, optional - Columns to be selected for feature fitting. Defaults to None.
  • idx tuple, optional - indexes to be selected Default (3,1) optim - torch Optimizer loss - torch Loss function (nn)

plot

def plot()

Plots 2d plot of prediction vs real values

performance

def performance(c=0.4) -> dict

RCI performnace. APE errors.

Arguments:

  • c float, optional - correction constant to mitigate division by 0 error. Defaults to 0.4.

Returns:

  • dict - {'Generator_Accuracy, %':np.mean(a),'APE_abs, %':abs_ape,'Model_APE, %': ape}

nets.dense

Net Objects

class Net(nn.Module)

The Net class inherits from the nn.Module class, which has a number of attributes and methods (such as .parameters() and .zero_grad()) which we will be using. You can read more about the nn.Module class here

__init__

def __init__(input_dim: int = 2, hidden_dim: int = 200)

We create a neural network with two hidden layers, each with hidden_dim neurons, and a ReLU activation

function. The output layer has one neuron and no activation function

Arguments:

  • input_dim (int (optional)): The dimension of the input, defaults to 2
  • hidden_dim (int (optional)): The number of neurons in the hidden layer, defaults to 200

nets.design

B_field_norm

def B_field_norm(Bmax: float, L: float, k: int = 16, plot=True) -> np.array

Returns vec B_z for MS config

Arguments:

  • Bmax any - maximum B in thruster L - channel length k - magnetic field profile number

PUdesign

def PUdesign(P: float, U: float) -> pd.DataFrame

Computes design via numerical model, uses fits from PINNs

Arguments:

  • P float - description
  • U float - description

Returns:

  • _type_ - description

nets.deep_dense

dmodel Objects

class dmodel(nn.Module)

__init__

def __init__(in_features=1, hidden_features=200, out_features=1)

We're creating a neural network with 4 layers, each with 200 neurons. The first layer takes in the input, the second layer takes in the output of the first layer, the third layer takes in the output of the second layer, and the fourth layer takes in the output of the third layer

Arguments:

  • in_features: The number of input features, defaults to 1 (optional)
  • hidden_features: the number of neurons in the hidden layers, defaults to 200 (optional)
  • out_features: The number of classes for classification (1 for regression), defaults to 1 (optional)

nets.opti

nets.opti.blackbox

Hyper Objects

class Hyper(SCI)

Hyper parameter tunning class. Allows to generate best NN architecture for task. Inputs are column indexes. idx[-1] is targeted value. Based on OPTUNA algorithms it is very fast and reliable. Outputs are NN parameters in json. Optionally full report for every trial is available at the neptune.ai

__init__

def __init__(idx: tuple = (1, 3, 7), *args, **kwargs)

The function init() is a constructor that initializes the class Hyper

Arguments:

  • idx (tuple): tuple of integers, the indices of the data to be loaded

define_model

def define_model(trial)

We define a function that takes in a trial object and returns a neural network with the number

of layers, hidden units and activation functions defined by the trial object.

Arguments:

  • trial: This is an object that contains the information about the current trial

Returns:

A sequential model with the number of layers, hidden units and activation functions defined by the trial.

objective

def objective(trial)

We define a model, an optimizer, and a loss function. We then train the model for a number of

epochs, and report the loss at the end of each epoch

"optimizer": ["Adam", "RMSprop", "SGD" 'AdamW','Adamax','Adagrad'] "lr" $\in$ [1e-7,1e-3], log=True

Arguments:

  • trial: The trial object that is passed to the objective function

Returns:

The accuracy of the model.

start_study

def start_study(n_trials: int = 100,
                neptune_project: str = None,
                neptune_api: str = None)

It takes a number of trials, a neptune project name and a neptune api token as input and runs

the objective function on the number of trials specified. If the neptune project and api token are provided, it logs the results to neptune

Arguments:

  • n_trials (int (optional)): The number of trials to run, defaults to 100
  • neptune_project (str): the name of the neptune project you want to log to
  • neptune_api (str): your neptune api key