hjian42 commited on
Commit
9f470fd
·
1 Parent(s): d9f7a3b

Create new file

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -1,3 +1,34 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
4
+
5
+ ## Model Specification
6
+ - This is the **Democratic** community GPT-2 language model, fine-tuned on 4.7M (~100M tokens) tweets of Democratic Twitter users between 2019-01-01 and 2020-04-10.
7
+ - For more details about the `CommunityLM` project, please refer to this [our paper](https://arxiv.org/abs/2209.07065) and [github](https://github.com/hjian42/communitylm) page.
8
+ - In the paper, it is referred as the `Fine-tuned CommunityLM` for the Democratic Twitter community.
9
+
10
+ ## How to use the model
11
+
12
+ - **PRE-PROCESSING**: when you apply the model on tweets, please make sure that tweets are preprocessed by the [TweetTokenizer](https://github.com/VinAIResearch/BERTweet/blob/master/TweetNormalizer.py) to get the best performance.
13
+
14
+ ```python
15
+ from transformers import AutoTokenizer, AutoModelForCausalLM
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained("CommunityLM/republican-twitter-gpt2")
18
+
19
+ model = AutoModelForCausalLM.from_pretrained("CommunityLM/republican-twitter-gpt2")
20
+ ```
21
+
22
+ ## References
23
+
24
+ If you use this repository in your research, please kindly cite [our paper](https://arxiv.org/abs/2209.07065):
25
+
26
+ ```bibtex
27
+ @inproceedings{jiang-etal-2022-communitylm,
28
+ title = "CommunityLM: Probing Partisan Worldviews from Language Models",
29
+ author = {Jiang, Hang and Beeferman, Doug and Roy, Brandon and Roy, Deb},
30
+ booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
31
+ year = "2022",
32
+ publisher = "International Committee on Computational Linguistics",
33
+ }
34
+ ```