Understanding and Predicting Derailment in Toxic Conversations on GitHub
Abstract
Software projects thrive on the involvement and contributions of individuals from different backgrounds. However, toxic language and negative interactions can hinder the participation and retention of contributors and alienate newcomers. Proactive moderation strategies aim to prevent toxicity from occurring by addressing conversations that have derailed from their intended purpose. This study aims to understand and predict conversational derailment leading to toxicity on GitHub. To facilitate this research, we curate a novel dataset comprising 202 toxic conversations from GitHub with annotated derailment points, along with 696 non-toxic conversations as a baseline. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points, including linguistic markers such as second-person pronouns, negation terms, and tones of Bitter Frustration and Impatience, as well as patterns in conversational dynamics between project contributors and external participants. Leveraging these empirical observations, we propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation. By utilizing modern LLMs, we develop a conversation trajectory summary technique that captures the evolution of discussions and identifies early signs of derailment. Our experiments demonstrate that LLM prompts tailored to provide summaries of GitHub conversations achieve 69% F1-Score in predicting conversational derailment, strongly improving over a set of baseline approaches.
Community
Proactive moderation in GitHub to mitigate toxic conversations.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Echoes of Discord: Forecasting Hater Reactions to Counterspeech (2025)
- Towards Refining Developer Questions using LLM-Based Named Entity Recognition for Developer Chatroom Conversations (2025)
- Dynamics of Toxicity in Political Podcasts (2025)
- Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey (2025)
- Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations (2025)
- Hotter and Colder: A New Approach to Annotating Sentiment, Emotions, and Bias in Icelandic Blog Comments (2025)
- A 106K Multi-Topic Multilingual Conversational User Dataset with Emoticons (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper