--- title: Insights emoji: 📈 colorFrom: gray colorTo: yellow sdk: streamlit sdk_version: 1.33.0 app_file: app.py pinned: false --- # Insights ## Deployment [HuggingFace](https://huggingface.co/spaces/AtharvaThakur/Insights) # Insights: Gen-AI Based Data Analysis Tool ## Overview **Insights** is a state-of-the-art data analysis tool that leverages the Gemini-Pro large language model (LLM) to automate and enhance the data analysis process. This tool aims to perform end-to-end data analysis tasks, providing substantial cost and time savings while matching or exceeding the performance of junior data analysts. ## Table of Contents 1. [Introduction](#introduction) 2. [Features](#features) 3. [System Architecture](#system-architecture) 4. [Modules Overview](#modules-overview) 5. [Usage](#usage) 6. [Installation](#installation) 7. [License](#license) ## Introduction In today's data-driven world, robust data analysis tools are crucial for informed decision-making and strategic planning. Traditional data analysis methods often face challenges such as time-consuming processes, potential for errors, and the need for specialized expertise. **Insights** addresses these issues by utilizing AI to streamline and enhance the data analysis process. ## Features - **Automated Data Analysis**: Perform data collection, visualization, and analysis with minimal human intervention. - **Advanced Summarization**: Generate detailed summaries and potential questions for datasets. - **Exploratory Data Analysis (EDA)**: Tools for statistical summaries, distribution plots, and correlation matrices. - **Data Cleaning and Transformation**: Functions for handling missing values, outlier detection, normalization, and feature engineering. - **Machine Learning Toolkit**: Automates model selection, training, hyperparameter tuning, and evaluation. - **Query Answering Module**: Generate Python code to answer user queries and produce visualizations. ## System Architecture The **Insights** tool is built on the Gemini platform and consists of three main components: 1. **Summary Module** 2. **QA Module** 3. **Code Execution and Analysis Generation** ### Summary Module Extracts essential details about the dataset and generates a comprehensive summary along with potential questions for further exploration. ### QA Module Handles user queries related to the dataset, generating Python code to answer the queries and produce visualizations. ### Code Execution and Analysis Generation Executes the generated Python code offline to ensure data security, producing detailed responses and visualizations. ## Modules Overview ### Summary Generation 1. **Information Extraction**: Extracts critical information from the dataset. 2. **Prompting Gemini**: Constructs a detailed prompt for Gemini to generate summaries and questions. 3. **Summary and Question Generation**: Generates a summary and potential questions for user review. ### Data Exploration Includes tools for EDA, data cleaning, and data transformation. ### ML Toolkit Facilitates the creation and evaluation of machine learning models on the dataset. ### QA Module Allows users to query the dataset and receive answers along with visualizations. The process involves: 1. Accepting user queries. 2. Combining queries with dataset information. 3. Generating and executing Python code offline. 4. Producing visualizations and textual data. ### Analysis Generation Processes the output from code execution to create concise and insightful responses. ## Usage 1. Initialize the Tool: `python app.py` 2. Load Dataset: Upload your dataset when prompted. 3. Generate Summary: The tool will automatically generate a summary and potential questions. 4. Exploratory Data Analysis: Use the EDA tools to explore your dataset. 5. Query the Dataset: Enter your queries to receive answers and visualizations. 6. Analyze Results: Review the detailed analysis generated by the tool. ## Installation 1. Install the required packages: The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip: ``` pip install -r requirements.txt ``` 2. Run the application: Now, you're ready to run the application. Use the following command to start the Streamlit server: ``` streamlit run app.py ``` # Running using Docker 1. Build the docker image using ``` docker build -t insights . ``` 2. Run the Docker container with ``` docker run -p 8501:8501 -e GOOGLE_API_KEY= insights ```