Spaces:

anjasafm
/

Milestone-2-Deployment

Sleeping

App Files Files Community

Milestone-2-Deployment / eda.py

anjasafm

Upload eda.py

4f3f880 verified over 1 year ago

raw

history blame contribute delete

8.24 kB

	import streamlit as st
	import pandas as pd
	import numpy as np
	import seaborn as sns
	import matplotlib.pyplot as plt
	# from phik.report import plot_correlation_matrix
	from PIL import Image
	st.set_option('deprecation.showPyplotGlobalUse', False)

	#membuat function untuk nantinya dipanggil di app.py
	def run():
	st.title('Welcome to Exploratory Data Analysis')
	# Load Data from CSV
	df = pd.read_csv('customer_churn.csv')

	# Central Tendency
	st.title('Central Tendency Data')
	# Checking Central Tendency Numerical Data
	df_describe = df.describe().T
	df_describe = df_describe.apply(lambda x: x.map('{:.2f}'.format))
	df_describe = pd.DataFrame(df_describe)
	df_describe

	# Showing Explanation
	with st.expander('Explanation'):
	st.caption("From central tendency we can get information related to count, mean, standard deviation, min, max, q1, q2, q3 from each column containing numeric data. The std values of the columns are quite large, meaning that the range between data values is large, indicating that there are outliers in some columns. The min and max values in Price's column are quite far away.")

	# Target Visualization
	st.title('Target Exploration Churn Pie & Bar Chart')
	# Setting up the figure for subplots
	plt.figure(figsize=(14, 7))

	# Subplot 1: Pie chart for distribution of non-default payments
	plt.subplot(1, 2, 1) # 1 row, 2 columns, 1st subplot
	non_default_counts = df['Churn'].value_counts(normalize=True)
	plt.pie(non_default_counts, labels=['Not Churn (' + str(non_default_counts[0]100)[:4] + '%)', 'Churn (' + str(non_default_counts[1]100)[:4] + '%)'], autopct='%1.1f%%', startangle=140, colors=['crimson', 'coral'])
	plt.title('Distribution of Customer Churn')

	# Subplot 2: Bar chart for count of non-default vs default payments
	plt.subplot(1, 2, 2) # 1 row, 2 columns, 2nd subplot
	barplot = sns.countplot(x='Churn', data=df, palette=['crimson', 'coral'])
	plt.title('Count of Customer Churn')
	plt.xticks([0, 1], ['Not Churn', 'Churn'])
	plt.xlabel('Churn Status')
	plt.ylabel('Count')

	# Adding count labels above each bar
	for p in barplot.patches:
	barplot.annotate(format(p.get_height(), '.0f'),
	(p.get_x() + p.get_width() / 2., p.get_height()),
	ha = 'center',
	va = 'center',
	xytext = (0, 10),
	textcoords = 'offset points')

	# Show the plot
	plt.tight_layout()
	st.pyplot()
	# Showing Explanation
	with st.expander('Explanation'):
	st.caption('From the visualization, we can see that a large number of customers Not Churn 33881 (52.6%) and those who will be Churn 30493 (47.4%). In my opinion, this number is a pretty bad number for the company, because customers who Not Churn and those who do Churn are only a very small difference so it is necessary to improve both technically and non-technically.')

	# Age Distribution
	st.title('Age Distribution by Customer Churn')
	# Histogram for the 'age' column
	plt.figure(figsize=(14, 7))
	sns.histplot(x='Age', hue='Churn', data=df , bins=30, kde=True)
	plt.title('Age Distribution by Customer Churn')
	plt.xlabel('Age')
	plt.ylabel('Frequency')
	st.pyplot()

	# Showing Explanation
	with st.expander('Explanation'):
	st.caption('The distribution of ages appears roughly symmetrical, with a slight right skew. We can see that the frequency of churn is less than the frequency of continued service across all age groups.')

	# Gender
	st.title('Customer Churn by Gender')
	# Churn rates by gender from the previous calculation
	churn_rates = {
	'Female': 0.587951,
	'Male': 0.412049
	}

	# Data to plot
	labels = churn_rates.keys()
	sizes = churn_rates.values()
	colors = ['#ff9999','#66b3ff'] # pink for female, light blue for male
	explode = (0.1, 0) # explode 1st slice for emphasis

	# Plotting the pie chart
	plt.figure(figsize=(8, 6))
	plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
	shadow=True, startangle=140)
	plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
	plt.title('Churn Rate by Gender')
	st.pyplot()
	# Showing Explanation
	with st.expander('Explanation'):
	st.caption('We can see from the results of the pierchart visualization above, where the possibility of churn rates in the female gender tends to be more than men, where women 58.8% and men 41.2% of the total data, from these results we can later make improvements to maintain female gender customers.')

	# Hist
	st.title('Customer Churn by Subscription Type')
	# Calculate churn rate by subscription type
	churn_rate_by_subscription = df.groupby('Subscription Type')['Churn'].mean()
	# normalized the calculation
	churn_rate_by_subscription_normalized = churn_rate_by_subscription / churn_rate_by_subscription.sum()

	# Plot the churn rate by subscription type
	plt.figure(figsize=(14, 7))
	churn_rate_by_subscription_normalized.plot(kind='bar', color='lightsalmon', title='Churn Rate by Subscription Type')
	plt.xlabel('Subscription Type')
	plt.ylabel('Churn Rate')
	plt.xticks(rotation=0)
	# Add labels on each bar
	for i, rate in enumerate(churn_rate_by_subscription_normalized):
	plt.text(i, rate, f'{rate:.4f}', ha='center', va='bottom')

	st.pyplot()
	# Showing Explanation
	with st.expander('Explanation'):
	st.caption('For Further Analysis Since churn rates are similar, it suggests that factors other than subscription type may have a more significant impact on churn. And for Business Strategy A churn rate approaching 50% warrants a detailed examination of customer service practices, product quality, pricing strategy, and competitive pressures. Strategies need to be implemented to enhance customer satisfaction and loyalty across all subscription types.')

	# Contract Length
	st.title('Customer Churn by Contract Length')
	# Group the data by 'Contract Length' and calculate the mean churn for each contract length
	churn_rate_by_contract_length = df.groupby('Contract Length')['Churn'].mean()

	# normalized the calculation
	churn_rate_by_contract_length_normalized = churn_rate_by_contract_length / churn_rate_by_contract_length.sum()

	# Plot the churn rate by contract length
	plt.figure(figsize=(10, 6))
	sns.barplot(x=churn_rate_by_contract_length_normalized.index, y=churn_rate_by_contract_length_normalized.values)
	plt.title('Churn Rate by Contract Length')
	plt.xlabel('Contract Length (months)')
	plt.ylabel('Churn Rate')
	plt.xticks(rotation=0) # If there are many contract lengths, rotating the x-ticks can help with readability
	plt.tight_layout() # This will ensure that the labels and titles fit well in the plot area
	# Add labels on each bar
	for i, rate in enumerate(churn_rate_by_contract_length_normalized):
	plt.text(i, rate, f'{rate:.4f}', ha='center', va='bottom')
	st.pyplot()

	# Showing Explanation
	with st.expander('Explanation'):
	st.caption("The company might consider encouraging customers to sign up for longer contracts through incentives, as this could help reduce churn rates. However, since even annual contracts have a relatively high churn rate, it's essential to explore why customers are leaving and address those issues directly.")

	# Total Spend
	st.title('Customer Churn by Total Spend')
	# Histogram for the 'Total Spend' column
	# Set the color palette

	plt.figure(figsize=(14, 7))
	sns.histplot(x='Total Spend', hue='Churn', data=df , bins=30, kde=True)
	plt.title('Total Spend Distribution by Customer Churn')
	plt.xlabel('Total Spend')
	plt.ylabel('Frequency')
	st.pyplot()


	# Showing Explanation
	with st.expander('Explanation'):
	st.caption('The company might consider focusing retention efforts on customers in the lower to mid spend ranges, where the churn seems to be more prevalent. Incentivizing increased spend among these customers might be one strategy if indeed higher spend is associated with lower churn.')