File size: 7,502 Bytes
6e28620
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e0062ec
6e28620
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09dd2ef
6e28620
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
import streamlit as st
import pandas as pd
import numpy as np
import seaborn as sns
from PIL import Image
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import warnings
warnings.filterwarnings('ignore')

def run():
    # Page title
    st.title('πŸ“Š SalesBoost - Exploratory Data Analysis')
    st.markdown('### Amazon Sales Report Dataset')

    # --- Sidebar ---
    st.sidebar.image("src/Amazon.png", use_container_width=True)
    st.sidebar.title("SalesBoost")
    st.sidebar.markdown("""
    **Team Members**
    - πŸ‘©β€πŸ”¬ Avisa Rahma Benedicta (Data Scientist)
    - πŸ‘¨β€πŸ’» Muhammad Farhan Hendriyanto (Data Engineer)
    - πŸ‘©β€πŸ”¬ Neila Ismahunnisa (Data Analyst)
    - πŸ‘©β€πŸ”¬ Sesilia Virdha Kezia (Data Scientist)            
    """)
    st.sidebar.markdown("""
    **Batch HCK-027** """)

    # Display team and project image
    col1, col2 = st.columns([1, 3])
    with col1:
        image1 = Image.open('src/Amazon.png')
        st.image(image1, caption='Amazon')
    with col2:
        st.markdown("""
        **Column Description:**
        - **Date**              : Date of the sale
        - **Status**            : Status of the sale (e.g., Shipped, Cancelled)
        - **Fulfilment**        : Method of fulfilment (Amazon or Merchant)
        - **Sales Channel**     : Platform used (e.g., Amazon.in)
        - **Ship Service Level**: Shipping speed (Standard, Expedited)
        - **Category**          : Product category (e.g., Set, Kurta)
        - **Size**              : Size of the product (e.g., M, L, XL)
        - **Amazon Standard Id**: Unique product identifier
        - **Qty**               : Quantity sold
        - **Currency**          : Currency used (e.g., INR)
        - **Sales**             : Revenue from the sale
        - **Clean Ship State**  : Normalized shipping state name
        - **Promotion Used**    : Whether a promo code was applied
        """)

    # Load dataset
    data = pd.read_csv("src/data_for_modelling.csv", 
                       index_col=False, parse_dates=['date'])
    st.write('### Dataset Preview')
    st.dataframe(data)
    st.write("---")

    # 1. Sales Trend
    data_trend = data.groupby('date')['sales'].sum()

    # Create a figure and a set of subplots
    fig, ax = plt.subplots(figsize=(15, 6))
    ax.plot(data_trend.index, data_trend, color='red', linewidth=2)
    ax.set_xlabel('Date')
    ax.set_ylabel('Sales')
    ax.set_title('Sales Trend')
    st.write('The chart shows a decreasing trend in sales over time.')
    st.write("---")

    # 2. Seasonal Decomposition
    st.subheader("2. Time Series Decomposition")
    plt.rcParams['figure.figsize'] = (14, 9)

    # dekomposisi
    res = seasonal_decompose(data_trend, model='multiplicative', period=45)

    # Plot dan simpan figure yang dihasilkan
    fig = res.plot()
    
    st.pyplot(fig)
    st.markdown(''' 
             As mentioned earlier, the trend is decreasing. There appears to be a seasonal pattern occurring approximately every 1.5 months or 45 days (March 31st and May 15th).

            From the residual plot, we can see that the residual values are centered around 1, rather than zero. This suggests that there are components in the data that are not explained by the identified trend and seasonality. In other words, the decomposition leaves behind some unexplained variation, which could be noise or other hidden patterns not captured in the current decomposition.
             ''')
    st.write("---")

    # 3. Top Sales by Category
    st.subheader("3. Top Sales by Category")
    data_category = data[['category', 'sales']].groupby('category').sum().sort_values('sales', ascending=False)
    data_category = data_category.reset_index()

    fig = plt.figure(figsize=(15, 10))
    plt.bar(data_category['category'], data_category['sales'])
    plt.xlabel('Category')
    plt.ylabel('Sales')
    plt.title('Top Sales by Category')
    st.pyplot(fig)

    st.write('The Set category is the top-selling product category, followed by Kurta and Western Dress. On the other hand, Dupatta is the least popular category.')
    st.write("---")

    # 4. Product Distribution by Size & Category
    st.subheader("4. Product Distribution by Category and Size")
    # Step 1: Count product based on its category and size jumlah produk berdasarkan category dan size
    data_cat_size = data.groupby(['category', 'size'])['amazon_standard_id'].count().unstack(fill_value=0)

    # Step 2: Buat plot
    fig, ax = plt.subplots(figsize=(12, 6))
    im = ax.imshow(data_cat_size.values)

    # Step 3: Set ticks dan label-nya
    ax.set_xticks(range(len(data_cat_size.columns)))
    ax.set_xticklabels(data_cat_size.columns, rotation=45, ha="right")

    ax.set_yticks(range(len(data_cat_size.index)))
    ax.set_yticklabels(data_cat_size.index)

    # Step 4: Tambahkan label di tiap sel
    for i in range(len(data_cat_size.index)):
        for j in range(len(data_cat_size.columns)):
            text = ax.text(j, i, data_cat_size.values[i, j],
                        ha="center", va="center", color="w")

    # Step 5: Judul dan layout
    ax.set_title("Distribution of Products Sold by Category and Size")
    fig.tight_layout()
    st.pyplot(fig)
    st.write('Set as the top-selling product category sold the most at size M, followed by L and S. There is no free size in this product category. Kurta sold the most at L, followed by M and XL and similar with Set, there is no free size in this product category. Dupatta only sold at 3 in free size.')
    st.write("---")

    # 5. Promotion Use Pie Chart
    data_pie = data['promotion_used'].value_counts()
    # Creating plot
    fig = plt.figure(figsize=(10, 7))
    plt.pie(data_pie, labels=data_pie.index.map({True: 'Use Promotion Code', False: 'Not Use Promotion Code'}), autopct='%.0f%%', colors=['#ff9999', '#66b3ff'])
    plt.title('Distribution of Promotion Used')
    # show plot
    st.pyplot(fig)
    st.write('Most of our customers tends to use promotion code when buying our products, while only 40% decided to not use promotion code.')
    st.write("---")

    # 6. Top 5 High-Spending Customers
    st.subheader("6. Top 5 High-Spending Customers")
    data_spend = data.groupby(['amazon_standard_id', 'category'])['sales'].sum().unstack(fill_value=0)

    top5_spender = data_spend.loc[
        data_spend.sum(axis=1).sort_values(ascending=False).head(5).index
    ]

    # Buat figure dan axes secara eksplisit
    fig, ax = plt.subplots(figsize=(10, 6))

    # Plot ke objek axes
    top5_spender.plot(kind='bar', stacked=True, ax=ax)

    ax.set_title('Top 5 High Spender Customers - Stacked by Category')
    ax.set_xlabel('Amazon Standard ID')
    ax.set_ylabel('Total Sales')
    ax.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.tight_layout()

    # Tampilkan di Streamlit
    st.pyplot(fig)
    st.markdown("""
    Based on the chart above, even when displayed as a stacked bar chart, only one color (brown, representing the 'Set' category) is visible. 
    This could indicate two things: 
                
    (1) the 'Set' category is either the most expensive or the most preferred product, or 
                
    (2) high-spending customers tend to favor the 'Set' category.
    """)  
    st.markdown("---")
    st.markdown("### πŸ™ Ikan hiu makan tomat \
                \
                Thank you for exploring with us! πŸ™")

if __name__ == '__main__':
    run()