m. polinsky commited on
Commit
3fecede
1 Parent(s): 5e0d56b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -7,6 +7,8 @@
7
  The app displays topics, the user chooses up to three, and the app spins up a topical digest scraped from the headlines.
8
  This project makes heavy use of HuggingFace for NLP, and Gazpacho for web scraping.
9
 
 
 
10
  **The pipeline:**
11
 
12
  * Current headlines are scraped from two news sites.
@@ -15,6 +17,26 @@ This project makes heavy use of HuggingFace for NLP, and Gazpacho for web scrapi
15
  * User selects up to three clusters
16
  * Articles from those clusters are scraped, the articles summarized in chunks, and the summaries concatenated to create a digest.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  This application was created as the culmination of a semester of independent graduate research into NLP and transformers.
19
 
20
  Original repo for the earlier version of this app is located at https://github.com/mpolinsky/sju_final_project/
 
7
  The app displays topics, the user chooses up to three, and the app spins up a topical digest scraped from the headlines.
8
  This project makes heavy use of HuggingFace for NLP, and Gazpacho for web scraping.
9
 
10
+ The method of article selection here is arbitrary. Pre-assigned article tags could be used to select groups of articles, or semantic-similarity methods could be used to evaluate the article text. In practice, an enterprise that would institute such a system would have their articles accessible in a database they own, and would be able to perform background processing to have summaries ready on demand.
11
+
12
  **The pipeline:**
13
 
14
  * Current headlines are scraped from two news sites.
 
17
  * User selects up to three clusters
18
  * Articles from those clusters are scraped, the articles summarized in chunks, and the summaries concatenated to create a digest.
19
 
20
+ **This app explores a few ideas:**
21
+
22
+ * IR for QA and comprehension
23
+ ** A cheap and quick way to explore area of research dominated by large, end-to-end trained models like RAG and NewsSum or w.e....TK
24
+ * News delivery and access
25
+ ** CNN provides summaries but there's a huge difference between being served something and being able to "create" my news.
26
+ ** Sneaks around headlines...what's in the article? Headlines can push and pull....
27
+ ** removes control over our attention but enables empowered consumption while keeping news production in the hands of pros.
28
+ * Editorial ideation
29
+ ** Can be used to find implied but uncovered stories by creating news assemblages without knowing eactly what you'll get. Even though an editor knows what they're currently covering, imagine them writing a sentence describing each article on a piece of paper -- that's not the same as seeing the information in the final articles assembled and juxtaposed like this.
30
+ * Cross-article information access
31
+ ** Information that's related and that paints a picture can be broken across multiple articles from different times...There are more stories lying latent in the told stories.
32
+ * Whole news article summarization pitfalls and windfalls.
33
+ ** Doing whole articles...technique and results.
34
+ * Community pantry principle
35
+ ** No free lunch but there is a community pantry. It only gets you so close.
36
+ * Evaluating summarization
37
+ ** Difficult to objectively evaluate summarization capability beyond a general level.
38
+
39
+
40
  This application was created as the culmination of a semester of independent graduate research into NLP and transformers.
41
 
42
  Original repo for the earlier version of this app is located at https://github.com/mpolinsky/sju_final_project/