DataFlowPro / README.md
boringnose's picture
Update README.md
011ad11 verified
|
raw
history blame
3.66 kB
metadata
license: gpl-3.0
title: DataFlowPro
sdk: streamlit
emoji: πŸš€
colorFrom: purple
colorTo: pink
short_description: Automating ML Workflows with Ease

DataFlow Pro

Automating ML Workflows with Ease

Introduction

The Automated ML is a Python application designed to automate the process of building, tuning, and evaluating machine learning models based on json provided in RTF/JSON?/TXT file format.
This application follows a structured flow to read the json file, extract dataset information, transform features, split data, build and tune models, and evaluate their performance.

Installation

To use the Automated ML Pipeline, follow these steps:

  1. Clone this repository to your local machine:

git clone https://github.com/Rupanshu-Kapoor/AutomateML.git

  1. Install the required dependencies:
    pip install -r requirements.txt

  2. Run the application:
    streamlit run app.py

Steps to Use the Application:

You can use the application in following two ways:

(A). Create Json and Train Model

  1. Upload the dataset on the tool on which you want to train the different model.
  2. Once the data is uploaded, you can preview the dataset.
  3. Select prediction parameters (prediction type, target variable, k-fold, etc.).
  4. Select features to be used for prediction.
  5. When you select any feature, you can choose how to handle it. (rescaling, encoding, etc.)
  6. Select the model to be used for prediction.
  7. When you select any model, you can choose hyperparameters for tuning.
  8. Once all the parameters are selected, click on Generate Json and Train Model button.
  9. Application will generate the json file and train the model and display the results.

(B). Upload Json and Train Model

  1. Upload the json file that contains all the dataset information.
  2. Click on Train Models.
  3. Application will train the model and display the results.

Working of the Application:

The application performs the following tasks in sequence:

  1. Read the JSON File and Parse JSON Content: The RTF/JSON file is read, converted to plain text, and JSON content is extracted.
  2. Extract Dataset Information: Extract dataset information such as feature names, target variable, problem type (regression/classification), feature handling, etc.
  3. Transform Features: Features are transformed based on the specified feature handling methods.
  4. Sample Data and Train-Test Split: Data is sampled and split into training and testing sets.
  5. Model Building: Models are built based on the problem type (regression/classification).
  6. Hyperparameter Tuning: Hyperparameters of the models are tuned using grid search.
  7. Model Evaluation: Trained models are evaluated using specified evaluation metrics. <! --8. **Save Results**: Trained models and evaluation metrics are saved in the results/ directory. -->

Use Cases

This application can be used for various use cases, including but not limited to:

  • Automated machine learning (AutoML) pipelines.
  • Data preprocessing and feature engineering tasks.
  • Model training and evaluation for regression or classification problems.
  • Hyperparameter tuning and model selection.
  • Experimentation with different datasets and configurations.

Future Work

Possible future enhancements for the application include:

  • Adding support for additional data formats (e.g., CSV, Excel).
  • Implementing more advanced feature engineering techniques.
  • Incorporating more sophisticated model selection and evaluation methods.
  • Enhancing the user interface for easier interaction.
  • Integrating with external APIs or databases for data retrieval.