Ramavill
/

twBETO_v0

Inference Endpoints

Model card Files Files and versions Community

Ramavill commited on Nov 23, 2024

Commit

5c357ce

·

verified ·

1 Parent(s): 288ae48

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ license: apache-2.0
 # Please use 'Roberta' related functions to load this model!
 This repository contains the resources in our paper
-**[Protest Stance Detection: Leveraging heterogeneous user interactions for extrapolation in out-of-sample country contexts]**
 *Ramon Villa-Cox, Evan Williams, Kathleen M. Carley*
 We pre-trained a BERT language model, we call *TwBETO_v0* following the robust approach introduced in RoBERTa. We opted for the smaller architecture dimensions introduced in DistilBERT, namely, 6 hidden layers with 12 attention heads. We also reduce the model's maximum sequence length to 128 tokens, following another BERT instantiation trained on English Twitter data (*BERTweet*). We utilize the RoBERTa implementation in the Hugging Face library and optimize the model using Adam with weight decay, a linear schedule with warmup and a maximum learning rate of 2e-4. We use a global batch size (via gradient accumulation) of 5k across 4 Titan XP GPUs (12 GB RAM each) and trained the model for 650 hours.

 # Please use 'Roberta' related functions to load this model!
 This repository contains the resources in our paper
+**[Social Context in Political Stance Detection: Impact and Extrapolation]**
 *Ramon Villa-Cox, Evan Williams, Kathleen M. Carley*
 We pre-trained a BERT language model, we call *TwBETO_v0* following the robust approach introduced in RoBERTa. We opted for the smaller architecture dimensions introduced in DistilBERT, namely, 6 hidden layers with 12 attention heads. We also reduce the model's maximum sequence length to 128 tokens, following another BERT instantiation trained on English Twitter data (*BERTweet*). We utilize the RoBERTa implementation in the Hugging Face library and optimize the model using Adam with weight decay, a linear schedule with warmup and a maximum learning rate of 2e-4. We use a global batch size (via gradient accumulation) of 5k across 4 Titan XP GPUs (12 GB RAM each) and trained the model for 650 hours.