Atharva Thakur commited on
Commit
6b53acf
1 Parent(s): d6e847d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -36
README.md CHANGED
@@ -13,25 +13,96 @@ pinned: false
13
  ## Deployment
14
  [HuggingFace](https://huggingface.co/spaces/AtharvaThakur/Insights)
15
 
16
- ## Modules
17
 
18
- - `DataLoader`: Handles the loading of data either by uploading a CSV file or inputting a URL to a CSV file.
19
- - `DataAnalyzer`: Provides summary statistics and data types of the loaded dataset.
20
- - `DataFilter`: Allows users to filter rows based on user-defined conditions.
21
- - `DataTransformer`: Enables users to perform operations on columns.
22
- - `DataVisualizer`: Visualizes data with various types of plots (Histogram, Box Plot, Pie Chart, Scatter Plot, Heatmap).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Features
25
 
26
- - Upload CSV files or load data from a URL.
27
- - Display the uploaded dataset.
28
- - Show summary statistics and data types.
29
- - Filter rows based on user-defined conditions.
30
- - Perform operations on columns.
31
- - Visualize data with various types of plots (Histogram, Box Plot, Pie Chart, Scatter Plot, Heatmap).
32
- - Transform data.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- ## Detailed Installation Instructions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  1. Install the required packages:
37
  The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip:
@@ -44,28 +115,6 @@ pinned: false
44
  streamlit run app.py
45
  ```
46
 
47
- ## Web app
48
- 1. Main page
49
- Data Exploration
50
- -> Data Loader
51
- -> DataQA (LLM with python interpreter/CSV agent)
52
- -> Data Analyzer
53
- -> Data Filter
54
- -> Data Visualizer
55
-
56
- 2. Data Transformation
57
- -> handling null values
58
- -> creating new columns
59
- -> removing columns
60
- -> Changing datatypes
61
- -> give option to analyse the transformed dataset or save it.
62
-
63
- 3. Natural language dataparty (Pure LLM)
64
- -> Insights generation
65
- -> Automating the data analysis/transformation
66
- -> generating a report
67
-
68
-
69
  # Running using Docker
70
  1. Build the docker image using
71
  ```
 
13
  ## Deployment
14
  [HuggingFace](https://huggingface.co/spaces/AtharvaThakur/Insights)
15
 
16
+ # Insights: Gen-AI Based Data Analysis Tool
17
 
18
+ ## Overview
19
+
20
+ **Insights** is a state-of-the-art data analysis tool that leverages the Gemini-Pro large language model (LLM) to automate and enhance the data analysis process. This tool aims to perform end-to-end data analysis tasks, providing substantial cost and time savings while matching or exceeding the performance of junior data analysts.
21
+
22
+ ## Table of Contents
23
+
24
+ 1. [Introduction](#introduction)
25
+ 2. [Features](#features)
26
+ 3. [System Architecture](#system-architecture)
27
+ 4. [Modules Overview](#modules-overview)
28
+ 5. [Installation](#installation)
29
+ 6. [Usage](#usage)
30
+ 7. [Evaluation](#evaluation)
31
+ 8. [Contributors](#contributors)
32
+ 9. [License](#license)
33
+
34
+ ## Introduction
35
+
36
+ In today's data-driven world, robust data analysis tools are crucial for informed decision-making and strategic planning. Traditional data analysis methods often face challenges such as time-consuming processes, potential for errors, and the need for specialized expertise. **Insights** addresses these issues by utilizing AI to streamline and enhance the data analysis process.
37
 
38
  ## Features
39
 
40
+ - **Automated Data Analysis**: Perform data collection, visualization, and analysis with minimal human intervention.
41
+ - **Advanced Summarization**: Generate detailed summaries and potential questions for datasets.
42
+ - **Exploratory Data Analysis (EDA)**: Tools for statistical summaries, distribution plots, and correlation matrices.
43
+ - **Data Cleaning and Transformation**: Functions for handling missing values, outlier detection, normalization, and feature engineering.
44
+ - **Machine Learning Toolkit**: Automates model selection, training, hyperparameter tuning, and evaluation.
45
+ - **Query Answering Module**: Generate Python code to answer user queries and produce visualizations.
46
+
47
+ ## System Architecture
48
+
49
+ The **Insights** tool is built on the Gemini platform and consists of three main components:
50
+
51
+ 1. **Summary Module**
52
+ 2. **QA Module**
53
+ 3. **Code Execution and Analysis Generation**
54
+
55
+ ### Summary Module
56
+
57
+ Extracts essential details about the dataset and generates a comprehensive summary along with potential questions for further exploration.
58
+
59
+ ### QA Module
60
+
61
+ Handles user queries related to the dataset, generating Python code to answer the queries and produce visualizations.
62
+
63
+ ### Code Execution and Analysis Generation
64
+
65
+ Executes the generated Python code offline to ensure data security, producing detailed responses and visualizations.
66
+
67
+ ## Modules Overview
68
 
69
+ ### Summary Generation
70
+
71
+ 1. **Information Extraction**: Extracts critical information from the dataset.
72
+ 2. **Prompting Gemini**: Constructs a detailed prompt for Gemini to generate summaries and questions.
73
+ 3. **Summary and Question Generation**: Generates a summary and potential questions for user review.
74
+
75
+ ### Data Exploration
76
+
77
+ Includes tools for EDA, data cleaning, and data transformation.
78
+
79
+ ### ML Toolkit
80
+
81
+ Facilitates the creation and evaluation of machine learning models on the dataset.
82
+
83
+ ### QA Module
84
+
85
+ Allows users to query the dataset and receive answers along with visualizations. The process involves:
86
+
87
+ 1. Accepting user queries.
88
+ 2. Combining queries with dataset information.
89
+ 3. Generating and executing Python code offline.
90
+ 4. Producing visualizations and textual data.
91
+
92
+ ### Analysis Generation
93
+
94
+ Processes the output from code execution to create concise and insightful responses.
95
+
96
+ ## Usage
97
+ 1. Initialize the Tool:
98
+ `python app.py`
99
+ 2. Load Dataset: Upload your dataset when prompted.
100
+ 3. Generate Summary: The tool will automatically generate a summary and potential questions.
101
+ 4. Exploratory Data Analysis: Use the EDA tools to explore your dataset.
102
+ 5. Query the Dataset: Enter your queries to receive answers and visualizations.
103
+ 6. Analyze Results: Review the detailed analysis generated by the tool.
104
+
105
+ ## Installation Instructions
106
 
107
  1. Install the required packages:
108
  The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip:
 
115
  streamlit run app.py
116
  ```
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  # Running using Docker
119
  1. Build the docker image using
120
  ```