Spaces:
Runtime error
Runtime error
import streamlit as st | |
st.title('Topic Modeling - Amazon Alexa Product Reviews') | |
st.image("https://media1.popsugar-assets.com/files/thumbor/b_wcWoX8BnK6L3uCB-cR4aMeEwc/fit-in/2048xorig/filters:format_auto-!!-:strip_icc-!!-/2018/01/08/823/n/38761221/tmp_OUodWP_a022221cbdae39dc_Alexa3_1_.gif", width=400) | |
st.markdown("## Overview") | |
st.markdown("Reviews have become an important channel for consumers to express their sentiment towards a product or a service. These reviews are then used by other consumers to make purchasing decisions and are also used by companies to improve these goods and services. So, how can users and companies extract information from these reviews without manually reading each one? The answer is **topic modeling**!") | |
st.markdown("## Data") | |
st.markdown("The dataset contains the 3,150 customer reviews for the Alexa Echo, Firestick, and Echo Dot products at Amazon. Each review text contains variable string inputs.") | |
st.markdown("## Approach") | |
st.markdown("**Background**: Topic model is a statistical model used to mine text to extract clusters of words that characterize that document. BERTopic model is an approach that uses transformers to embed the text using sentence transformers model *paraphrase-MiniLM-L6-v2* and class-based TF-IDF to cluster the words.") | |
st.markdown("**Method**:") | |
st.markdown("* Use the pre-trained BERTopic model to cluster words from each of the reviews") | |
st.markdown("* Reduce the number of topics in the model after the first round") | |
st.markdown("* Use sentence-transformer model to create new embeddings") | |
st.markdown("* Use updated model and new embeddings to run on the same set of reviews") | |
st.markdown("## Conclusion") | |
st.markdown("**Takeaways**: The initial run of the BERTopic model and further topic reduction resulted in 39 topics. However, there is a lot of duplicate words in each topic cluster. By implementing the sentence-transformer model to create new sentence embedding based on the reviews, the results were of higher quality even though the number of topics remained at 73.") | |
st.image('image.png') | |
st.markdown("## Critical Analysis") | |
st.markdown("* Pre-trained BERTopic probably works best on a diversified set of documents. Since each review is related to the same product, it was difficult to extract distinct topics without any redundancy.") | |
st.markdown("* There is inconsistency in text structure for reviews and introduces noise to the data so the model may not perform as well.") | |
st.markdown("**Next Steps**: Since not all reviews out are in the English-language, it would be interesting to use BERTopic on non-English texts using sentence-transformers *paraphrase-multilingual-MiniLM-L12-v2*, and then combine that with a translation model to translate the topics returned.") |