Insights / README.md
Atharva Thakur
Update README.md
a4078cd unverified

A newer version of the Streamlit SDK is available: 1.36.0

Upgrade
metadata
title: Insights
emoji: πŸ“ˆ
colorFrom: gray
colorTo: yellow
sdk: streamlit
sdk_version: 1.33.0
app_file: app.py
pinned: false

Insights

Deployment

HuggingFace

Insights: Gen-AI Based Data Analysis Tool

Overview

Insights is a state-of-the-art data analysis tool that leverages the Gemini-Pro large language model (LLM) to automate and enhance the data analysis process. This tool aims to perform end-to-end data analysis tasks, providing substantial cost and time savings while matching or exceeding the performance of junior data analysts.

Table of Contents

  1. Introduction
  2. Features
  3. System Architecture
  4. Modules Overview
  5. Usage
  6. Installation
  7. License

Introduction

In today's data-driven world, robust data analysis tools are crucial for informed decision-making and strategic planning. Traditional data analysis methods often face challenges such as time-consuming processes, potential for errors, and the need for specialized expertise. Insights addresses these issues by utilizing AI to streamline and enhance the data analysis process.

Features

  • Automated Data Analysis: Perform data collection, visualization, and analysis with minimal human intervention.
  • Advanced Summarization: Generate detailed summaries and potential questions for datasets.
  • Exploratory Data Analysis (EDA): Tools for statistical summaries, distribution plots, and correlation matrices.
  • Data Cleaning and Transformation: Functions for handling missing values, outlier detection, normalization, and feature engineering.
  • Machine Learning Toolkit: Automates model selection, training, hyperparameter tuning, and evaluation.
  • Query Answering Module: Generate Python code to answer user queries and produce visualizations.

System Architecture

The Insights tool is built on the Gemini platform and consists of three main components:

  1. Summary Module
  2. QA Module
  3. Code Execution and Analysis Generation

Summary Module

Extracts essential details about the dataset and generates a comprehensive summary along with potential questions for further exploration.

QA Module

Handles user queries related to the dataset, generating Python code to answer the queries and produce visualizations.

Code Execution and Analysis Generation

Executes the generated Python code offline to ensure data security, producing detailed responses and visualizations.

Modules Overview

Summary Generation

  1. Information Extraction: Extracts critical information from the dataset.
  2. Prompting Gemini: Constructs a detailed prompt for Gemini to generate summaries and questions.
  3. Summary and Question Generation: Generates a summary and potential questions for user review.

Data Exploration

Includes tools for EDA, data cleaning, and data transformation.

ML Toolkit

Facilitates the creation and evaluation of machine learning models on the dataset.

QA Module

Allows users to query the dataset and receive answers along with visualizations. The process involves:

  1. Accepting user queries.
  2. Combining queries with dataset information.
  3. Generating and executing Python code offline.
  4. Producing visualizations and textual data.

Analysis Generation

Processes the output from code execution to create concise and insightful responses.

Usage

  1. Initialize the Tool: python app.py
  2. Load Dataset: Upload your dataset when prompted.
  3. Generate Summary: The tool will automatically generate a summary and potential questions.
  4. Exploratory Data Analysis: Use the EDA tools to explore your dataset.
  5. Query the Dataset: Enter your queries to receive answers and visualizations.
  6. Analyze Results: Review the detailed analysis generated by the tool.

Installation

  1. Install the required packages: The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip:
    pip install -r requirements.txt
    
  2. Run the application: Now, you're ready to run the application. Use the following command to start the Streamlit server:
    streamlit run app.py
    

Running using Docker

  1. Build the docker image using
    docker build -t insights .
    
  2. Run the Docker container with
    docker run -p 8501:8501 -e GOOGLE_API_KEY=<you-api-key> insights