Ramavill commited on
Commit
288ae48
1 Parent(s): 58c3cb2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -13,7 +13,7 @@ This repository contains the resources in our paper
13
  **[Protest Stance Detection: Leveraging heterogeneous user interactions for extrapolation in out-of-sample country contexts]**
14
  *Ramon Villa-Cox, Evan Williams, Kathleen M. Carley*
15
 
16
- We pre-trained a BERT language model, we call \textit{TwBETO}, following the robust approach introduced in RoBERTa. We opted for the smaller architecture dimensions introduced in DistilBERT, namely, 6 hidden layers with 12 attention heads. We also reduce the model's maximum sequence length to 128 tokens, following another BERT instantiation trained on English Twitter data (*BERTweet*). We utilize the RoBERTa implementation in the Hugging Face library and optimize the model using Adam with weight decay, a linear schedule with warmup and a maximum learning rate of 2e-4. We use a global batch size (via gradient accumulation) of 5k across 4 Titan XP GPUs (12 GB RAM each) and trained the model for 650 hours.
17
 
18
  The model was trained with a corpus comprised of 155M Spanish tweets (4.5B words tokens), as determined by Twitter's API, and includes only original tweets (retweets are filtered out) with more than 6 tokens, while long tweets were truncated to 64 word tokens. The data was compiled from the following sources:
19
  - 110M Tweets (3B word tokens) from the South American protests collected from September 20 to December 31 of 2019.
 
13
  **[Protest Stance Detection: Leveraging heterogeneous user interactions for extrapolation in out-of-sample country contexts]**
14
  *Ramon Villa-Cox, Evan Williams, Kathleen M. Carley*
15
 
16
+ We pre-trained a BERT language model, we call *TwBETO_v0* following the robust approach introduced in RoBERTa. We opted for the smaller architecture dimensions introduced in DistilBERT, namely, 6 hidden layers with 12 attention heads. We also reduce the model's maximum sequence length to 128 tokens, following another BERT instantiation trained on English Twitter data (*BERTweet*). We utilize the RoBERTa implementation in the Hugging Face library and optimize the model using Adam with weight decay, a linear schedule with warmup and a maximum learning rate of 2e-4. We use a global batch size (via gradient accumulation) of 5k across 4 Titan XP GPUs (12 GB RAM each) and trained the model for 650 hours.
17
 
18
  The model was trained with a corpus comprised of 155M Spanish tweets (4.5B words tokens), as determined by Twitter's API, and includes only original tweets (retweets are filtered out) with more than 6 tokens, while long tweets were truncated to 64 word tokens. The data was compiled from the following sources:
19
  - 110M Tweets (3B word tokens) from the South American protests collected from September 20 to December 31 of 2019.