Spaces:

bunkalab
/

wikipedia-en

Sleeping

charlesdedampierre commited on Dec 18, 2023

Commit

84952b3

1 Parent(s): d934fdc

display a sample

Files changed (1) hide show

app.py CHANGED Viewed

@@ -15,12 +15,13 @@ st.sidebar.write(
 st.title("How to understand large textual datasets?")
 st.info(
-    "We randomly sampled 40,000 articles from the English subset 20231101.en of the Wikipedia dataset. We then took the first 500 words of each articles in order to generate an abstract that will be used for topic modeling."
 )
 df = pd.read_csv("data/data_sample_wikipedia.csv", index_col=[0])
 df = df[["text", "url"]]
 st.dataframe(df, use_container_width=True)

 st.title("How to understand large textual datasets?")
 st.info(
+    "We randomly sampled 40,000 articles from the English subset 20231101.en of the Wikipedia dataset. We then took the first 500 words of each articles in order to generate an abstract that will be used for topic modeling. Here is a sample:"
 )
 df = pd.read_csv("data/data_sample_wikipedia.csv", index_col=[0])
 df = df[["text", "url"]]
+df = df.head(100)
 st.dataframe(df, use_container_width=True)