Spaces:
Runtime error
Runtime error
Update README.md
Browse filesAdd line breaks and correct typos.
README.md
CHANGED
@@ -11,11 +11,14 @@ license: mit
|
|
11 |
---
|
12 |
|
13 |
# Politweet
|
14 |
-
In this summer project at Softhouse, we have developed a tool for analyzing Twitter posts by Swedish party leaders.
|
|
|
|
|
15 |
|
16 |
### Data Gathering
|
17 |
-
The tweets were gathered using the Twitter scraping tool [Twint](https://github.com/twintproject/twint).
|
18 |
-
"An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to
|
|
|
19 |
|
20 |
### Predicting: Topics, Sentiments & Targets
|
21 |
The classifications that are given by GPT-3 for every tweet contain:
|
@@ -24,24 +27,45 @@ The classifications that are given by GPT-3 for every tweet contain:
|
|
24 |
- ```sentiment``` - is the tweet positive, negative or neutral?
|
25 |
- ```target``` - who is the tweet targeting?
|
26 |
|
27 |
-
The predicted features were extracted using AI, and more specifically by utilizing the autoregressive
|
|
|
|
|
|
|
|
|
28 |
|
29 |
### Merging Topics & Targets
|
30 |
-
Since the output from GPT-3 varies a lot, e.g. topics can be similar but not identical, a method for
|
31 |
-
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
|
35 |
### Website User Manual
|
36 |
-
1. Enter the time period of the tweets that you want to look into - The dates need to be on the format
|
|
|
|
|
|
|
37 |
2. Select the party leader(s) you want to look into - At least one party leader has to be selected.
|
38 |
3. Select the classifications you want to see statistics of: topic, sentiment and/or target.
|
39 |
4. Apply - always press this button after you check a new box or change the dates to update the website.
|
40 |
5. Run
|
41 |
-
The
|
42 |
-
|
|
|
|
|
|
|
43 |
|
44 |
-
###
|
45 |
Each row in the database has the following structure:
|
46 |
```id,tweet,date,user_id,username,urls,nlikes,nreplies,nretweets,class_tuple,main_topic,sub_topic,sentiment,target,merged_tuple,merged_topic,merged_target,cos_sim_topic,synonym_topic,cos_sim_target,synonym_target```.
|
47 |
|
|
|
11 |
---
|
12 |
|
13 |
# Politweet
|
14 |
+
In this summer project at Softhouse, we have developed a tool for analyzing Twitter posts by Swedish party leaders.
|
15 |
+
The UI of this tool is in the form of a webpage which lets a user see a graphical representation of the predicted
|
16 |
+
features: topics, sentiments (positive, negative or neutral), and targets in the party leaders' tweets.
|
17 |
|
18 |
### Data Gathering
|
19 |
+
The tweets were gathered using the Twitter scraping tool [Twint](https://github.com/twintproject/twint).
|
20 |
+
"An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to
|
21 |
+
scrape a user's followers, following, Tweets and more while evading most API limitations.".
|
22 |
|
23 |
### Predicting: Topics, Sentiments & Targets
|
24 |
The classifications that are given by GPT-3 for every tweet contain:
|
|
|
27 |
- ```sentiment``` - is the tweet positive, negative or neutral?
|
28 |
- ```target``` - who is the tweet targeting?
|
29 |
|
30 |
+
The predicted features were extracted using AI, and more specifically by utilizing the autoregressive
|
31 |
+
language model GPT-3 by [OpenAI](https://openai.com/api/) in combination with our prompt engineering.
|
32 |
+
The final prompt is a result of experimentation while also trying to limit the length of the prompt.
|
33 |
+
Since OpenAI charge users based on "tokens", which is closely related to number of words, it would be
|
34 |
+
economically unsuitable to use lengthy prompts when classifying several thousands of tweets.
|
35 |
|
36 |
### Merging Topics & Targets
|
37 |
+
Since the output from GPT-3 varies a lot, e.g. topics can be similar but not identical, a method for
|
38 |
+
clustering similar topics and targets was needed in order to be able to represent statistics of these
|
39 |
+
classifications. Thus, the NN-algorithm was implemented using the cosine similarity as metric, after
|
40 |
+
transforming topics and targets to dense vector representations with
|
41 |
+
[Sentence Transformers](https://github.com/UKPLab/sentence-transformers). The similarities between the
|
42 |
+
classification from GPT-3 and words from a predefined set of classes are then calculated, and the
|
43 |
+
classification is changed to the predefined class that yielded the highest cosine similarity. It is worth
|
44 |
+
noting that each predefined class has several "synonyms" or categorical words, and that the highest cosine
|
45 |
+
similarity can be found between the classification and a word from that list.
|
46 |
+
|
47 |
+
Example - The GPT-3 classified topics are: ```main_topic = "sport"``` and ```sub_topic = "soccer"``` ->
|
48 |
+
```old_topic = "sport and soccer"```, and gets the highest similarity when compared to the word/synonym
|
49 |
+
"soccer" which is in the subset to the predefined class ```"Civil society and sport"``` ->
|
50 |
+
```new_topic = "Civil society and sport"```
|
51 |
|
52 |
|
53 |
### Website User Manual
|
54 |
+
1. Enter the time period of the tweets that you want to look into - The dates need to be on the format
|
55 |
+
"YYYY-MM-DD", and between 2021-06-28 and 2022-08-10. It is preferable to choose a bigger span rather than a
|
56 |
+
smaller one. Keep in mind that tweets the number of tweets posted by the party leaders will vary a lot
|
57 |
+
(Annie Lööf: 1814 vs Jimmie Åkesson: 185 tweets in total).
|
58 |
2. Select the party leader(s) you want to look into - At least one party leader has to be selected.
|
59 |
3. Select the classifications you want to see statistics of: topic, sentiment and/or target.
|
60 |
4. Apply - always press this button after you check a new box or change the dates to update the website.
|
61 |
5. Run
|
62 |
+
The pie charts and bar graphs should appear for your selected party leaders. Under the plots, a new panel
|
63 |
+
will appear which lets users see how a prediction was made, i.e. classification from GPT-3 -> the
|
64 |
+
6. To see examples of how the topic/sentiment/target was predicted, the user can select a type of
|
65 |
+
classification and check the box "show stats". To download the CSV file containing all tweets and
|
66 |
+
classifications for the checked party leaders, the user can check "Export file". After the selections, Apply and Run.
|
67 |
|
68 |
+
### Data Frame Structure
|
69 |
Each row in the database has the following structure:
|
70 |
```id,tweet,date,user_id,username,urls,nlikes,nreplies,nretweets,class_tuple,main_topic,sub_topic,sentiment,target,merged_tuple,merged_topic,merged_target,cos_sim_topic,synonym_topic,cos_sim_target,synonym_target```.
|
71 |
|