Spaces:
Sleeping
Sleeping
File size: 4,543 Bytes
1db187e 6144ebf ed8f946 6144ebf 1db187e a0155bf c7c0038 6b53acf a0155bf 6b53acf a4078cd 6b53acf a0155bf 6b53acf a0155bf 6b53acf a4078cd a0155bf afc8b97 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
title: Insights
emoji: 📈
colorFrom: gray
colorTo: yellow
sdk: streamlit
sdk_version: 1.33.0
app_file: app.py
pinned: false
---
# Insights
## Deployment
[HuggingFace](https://huggingface.co/spaces/AtharvaThakur/Insights)
# Insights: Gen-AI Based Data Analysis Tool
## Overview
**Insights** is a state-of-the-art data analysis tool that leverages the Gemini-Pro large language model (LLM) to automate and enhance the data analysis process. This tool aims to perform end-to-end data analysis tasks, providing substantial cost and time savings while matching or exceeding the performance of junior data analysts.
## Table of Contents
1. [Introduction](#introduction)
2. [Features](#features)
3. [System Architecture](#system-architecture)
4. [Modules Overview](#modules-overview)
5. [Usage](#usage)
6. [Installation](#installation)
7. [License](#license)
## Introduction
In today's data-driven world, robust data analysis tools are crucial for informed decision-making and strategic planning. Traditional data analysis methods often face challenges such as time-consuming processes, potential for errors, and the need for specialized expertise. **Insights** addresses these issues by utilizing AI to streamline and enhance the data analysis process.
## Features
- **Automated Data Analysis**: Perform data collection, visualization, and analysis with minimal human intervention.
- **Advanced Summarization**: Generate detailed summaries and potential questions for datasets.
- **Exploratory Data Analysis (EDA)**: Tools for statistical summaries, distribution plots, and correlation matrices.
- **Data Cleaning and Transformation**: Functions for handling missing values, outlier detection, normalization, and feature engineering.
- **Machine Learning Toolkit**: Automates model selection, training, hyperparameter tuning, and evaluation.
- **Query Answering Module**: Generate Python code to answer user queries and produce visualizations.
## System Architecture
The **Insights** tool is built on the Gemini platform and consists of three main components:
1. **Summary Module**
2. **QA Module**
3. **Code Execution and Analysis Generation**
### Summary Module
Extracts essential details about the dataset and generates a comprehensive summary along with potential questions for further exploration.
### QA Module
Handles user queries related to the dataset, generating Python code to answer the queries and produce visualizations.
### Code Execution and Analysis Generation
Executes the generated Python code offline to ensure data security, producing detailed responses and visualizations.
## Modules Overview
### Summary Generation
1. **Information Extraction**: Extracts critical information from the dataset.
2. **Prompting Gemini**: Constructs a detailed prompt for Gemini to generate summaries and questions.
3. **Summary and Question Generation**: Generates a summary and potential questions for user review.
### Data Exploration
Includes tools for EDA, data cleaning, and data transformation.
### ML Toolkit
Facilitates the creation and evaluation of machine learning models on the dataset.
### QA Module
Allows users to query the dataset and receive answers along with visualizations. The process involves:
1. Accepting user queries.
2. Combining queries with dataset information.
3. Generating and executing Python code offline.
4. Producing visualizations and textual data.
### Analysis Generation
Processes the output from code execution to create concise and insightful responses.
## Usage
1. Initialize the Tool:
`python app.py`
2. Load Dataset: Upload your dataset when prompted.
3. Generate Summary: The tool will automatically generate a summary and potential questions.
4. Exploratory Data Analysis: Use the EDA tools to explore your dataset.
5. Query the Dataset: Enter your queries to receive answers and visualizations.
6. Analyze Results: Review the detailed analysis generated by the tool.
## Installation
1. Install the required packages:
The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip:
```
pip install -r requirements.txt
```
2. Run the application:
Now, you're ready to run the application. Use the following command to start the Streamlit server:
```
streamlit run app.py
```
# Running using Docker
1. Build the docker image using
```
docker build -t insights .
```
2. Run the Docker container with
```
docker run -p 8501:8501 -e GOOGLE_API_KEY=<you-api-key> insights
```
|