Spaces:
Sleeping
Sleeping
Update all EDA, adding Loading Feature for each Plot
Browse files
eda.py
CHANGED
@@ -295,14 +295,48 @@ def app():
|
|
295 |
plt.title('Amount Balance vs New Balance Origin')
|
296 |
st.pyplot(fig)
|
297 |
st.write('The scatter plot shows the relationship between New Balance Origin and Amount Balance. Similar to the previous plot, the data points highlight how most transactions cluster around lower values for both balances.')
|
298 |
-
st.markdown('- **High Density at Lower Values
|
299 |
-
st.markdown('- **Vertical Distribution
|
300 |
-
st.markdown('- **Horizontal Distribution
|
301 |
|
302 |
st.divider()
|
303 |
|
304 |
# Multivariate analysis
|
305 |
st.header('Multivariate Analysis')
|
306 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
307 |
if __name__ == '__main__':
|
308 |
app()
|
|
|
295 |
plt.title('Amount Balance vs New Balance Origin')
|
296 |
st.pyplot(fig)
|
297 |
st.write('The scatter plot shows the relationship between New Balance Origin and Amount Balance. Similar to the previous plot, the data points highlight how most transactions cluster around lower values for both balances.')
|
298 |
+
st.markdown('- **High Density at Lower Values**: The majority of the data points are concentrated on the origin (0,0), indicating that most transactions involve smaller amounts for both New Balance Origin and Amount Balance.')
|
299 |
+
st.markdown('- **Vertical Distribution**: There are points with higher New Balance Origin spread vertically, mostly associated with lower Amount Balance values.')
|
300 |
+
st.markdown('- **Horizontal Distribution**: Some points with higher Amount Balance values spread horizontally but are typically associated with low New Balance Origin values.')
|
301 |
|
302 |
st.divider()
|
303 |
|
304 |
# Multivariate analysis
|
305 |
st.header('Multivariate Analysis')
|
306 |
+
|
307 |
+
# heatmap to visualize relationships
|
308 |
+
st.subheader('Heatmap of Correlation between Numeric Variables')
|
309 |
+
with st.spinner('Loading...'):
|
310 |
+
correlation_matrix = data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']].corr()
|
311 |
+
fig, ax = plt.subplots(figsize=(10, 8))
|
312 |
+
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt=".2f", ax=ax)
|
313 |
+
st.pyplot(fig)
|
314 |
+
st.write("""
|
315 |
+
The heatmap provides a visual representation of the correlation matrix for the numeric variables: `amount`, `oldbalanceOrg`, `newbalanceOrig`, `oldbalanceDest`, and `newbalanceDest`.
|
316 |
+
|
317 |
+
1. **Strong Correlations:**
|
318 |
+
- There is a perfect correlation (1.00) between `oldbalanceOrg` and `newbalanceOrig`, indicating that the balance in the origin account before and after the transaction are almost always identical.
|
319 |
+
- Similarly, `oldbalanceDest` and `newbalanceDest` have a very high correlation (0.98), showing that the balance in the destination account before and after the transaction is very closely related.
|
320 |
+
|
321 |
+
2. **Moderate Correlations:**
|
322 |
+
- `amount` shows moderate correlations with `oldbalanceDest` (0.29) and `newbalanceDest` (0.46). This indicates that the transaction amount has a moderate positive relationship with the balances in the destination account.
|
323 |
+
|
324 |
+
3. **Weak or No Correlations:**
|
325 |
+
- `amount` has very weak or no correlation with `oldbalanceOrg` (-0.00) and `newbalanceOrig` (-0.01), suggesting that the transaction amount is not significantly related to the balances in the origin account.
|
326 |
+
- Other correlations, such as between `oldbalanceOrg` and `oldbalanceDest` (0.07), are also weak, indicating minimal linear relationships between these variables.
|
327 |
+
""")
|
328 |
+
|
329 |
+
# Pairplot to visualize relationships
|
330 |
+
st.subheader('Pairplot of Numeric Variables')
|
331 |
+
with st.spinner('Loading...'):
|
332 |
+
fig = sns.pairplot(df[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']])
|
333 |
+
st.pyplot(fig)
|
334 |
+
st.write('The pair plot provides a detailed view of the relationships between the numeric variables: `amount`, `oldbalanceOrg`, `newbalanceOrig`, `oldbalanceDest`, and `newbalanceDest`.')
|
335 |
+
st.markdown('- **Strong Linear Relationships**: There are clear linear relationships between `oldbalanceOrg` and `newbalanceOrig`, as well as between `oldbalanceDest` and `newbalanceDest`. This indicates that the balance before and after transactions are highly correlated.')
|
336 |
+
st.markdown('- **Clustered Data Points**: Most data points are clustered near the lower end of the scales, especially for `amount` and `balances`, suggesting a high frequency of small-value transactions.')
|
337 |
+
st.markdown('- **Diagonal Lines**: The diagonal subplots show histograms of each variable, reflecting the distribution of individual variables. ')
|
338 |
+
st.markdown('- **Scattered Points**: There are noticeable outliers and scattered points in the relationships between `amount` and the balance variables, indicating some transactions involve significantly higher amounts than the majority.')
|
339 |
+
|
340 |
+
|
341 |
if __name__ == '__main__':
|
342 |
app()
|