import streamlit as st title = "Hate Speech in ACM" description = "The history and development of hate speech detection as a modeling task" date = "2022-01-26" thumbnail = "images/prohibited.png" __ACM_SECTION = """ Content moderation is a collection of interventions used by online platforms to partially obscure or remove entirely from user-facing view content that is objectionable based on the company's values or community guidelines, which vary from platform to platform. [Sarah T. Roberts (2014)](https://yalebooks.yale.edu/book/9780300261479/behind-the-screen/) describes content moderation as "the organized practice of screening user-generated content (UGC) posted to Internet sites, social media, and other online outlets" (p. 12). [Tarleton Gillespie (2021)](https://yalebooks.yale.edu/book/9780300261431/custodians-internet/) writes that platforms moderate content "both to protect one user from another, or one group from its antagonists, and to remove the offensive, vile, or illegal.'' While there are a variety of approaches to this problem, in this tool, we focus on automated content moderation, which is the application of algorithms to the classification of problematic content. Content that is subject to moderation can be user-directed (e.g. targeted harassment of a particular user in comments or direct messages) or posted to a personal account (e.g. user-created posts that contain hateful remarks against a particular social group). """ __CURRENT_APPROACHES = """ Automated content moderation has relied both on analysis of the media itself (e.g. using methods from natural language processing and computer vision) as well as user dynamics (e.g. whether the user sending the content to another user shares followers with the recipient, or whether the user posting the content is a relatively new account). Often, the ACM pipeline is fed by user-reported content. Within the realm of text-based ACM, approaches vary from wordlist-based approaches to data-driven, machine learning models. Common datasets used for training and evaluating hate speech detectors can be found at [https://hatespeechdata.com/](https://hatespeechdata.com/). """ __CURRENT_CHALLENGES = """ Combating hateful content on the Internet continues to be a challenge. A 2021 survey of respondents in the United States, conducted by Anti-Defamation League, found an increase in online hate & harassment directed at LGBTQ+, Asian American, Jewish, and African American individuals. ### Technical challenges for data-driven systems With respect to models that are based on training data, datasets encode worldviews, and so a common challenge lies in having insufficient data or data that only reflects a limited worldview. For example, a recent study found that Tweets posted by drag queens were more often rated by an automated system as toxic than Tweets posted by white supremacists. This may be due, in part, to the labeling schemes and choices made for the data used in training the model, as well as particular company policies that are invoked when making these labeling choices. (This all needs to be spelled out better!) ### Context matters for content moderation. *Counterspeech* is "any direct response to hateful or harmful speech which seeks to undermine it" (from [Dangerous Speech Project](https://dangerousspeech.org/counterspeech/)). Counterspeech has been shown to be an important community self-moderation tool for reducing instances of hate speech (see [Hangartner et al. 2021](https://www.pnas.org/doi/10.1073/pnas.2116310118)), but counterspeech is often incorrectly categorized as hate speech by automatic systems due to the counterspeech making direct reference to or quoting the original hate speech. Such system behavior silences those who are trying to push back against hateful and toxis speech, and, if the flagged content is hidden automatically, prevents others from seeing the counterspeech. See [van Aken et al. 2018](https://aclanthology.org/W18-5105.pdf) for a detailed list of examples that automatic systems frequently misclassify. """ __SELF_EXAMPLES = """ - [**(FB)(TOU)** - *Facebook Community Standards*](https://transparency.fb.com/policies/community-standards/) - [**(FB)(Blog)** - *What is Hate Speech? (2017)*](https://about.fb.com/news/2017/06/hard-questions-hate-speech/) - [**(NYT)(Blog)** - * New York Times on their partnership with JigSaw*](https://open.nytimes.com/to-apply-machine-learning-responsibly-we-use-it-in-moderation-d001f49e0644) - [**(NYT)(FAQ)** - *New York Times on their moderation policy*](https://help.nytimes.com/hc/en-us/articles/115014792387-Comments) - [**(Reddit)(TOU)** - *Reddit General Content Policies*](https://www.redditinc.com/policies/content-policy) - [**(Reddit)(Blog)** - *AutoMod - help scale moderation without ML*](https://mods.reddithelp.com/hc/en-us/articles/360008425592-Moderation-Tools-overview) - [**(Google)(Blog)** - *Google Search Results Moderation*](https://blog.google/products/search/when-and-why-we-remove-content-google-search-results/) - [**(Google)(Blog)** - *JigSaw Case Studies*](https://www.perspectiveapi.com/case-studies/) - [**(YouTube)(TOU)** - *YouTube Community Guidelines*](https://www.youtube.com/howyoutubeworks/policies/community-guidelines/) """ __CRITIC_EXAMPLES = """ - [Social Media and Extremism - Questions about January 6th 2021](https://thehill.com/policy/technology/589651-jan-6-panel-subpoenas-facebook-twitter-reddit-and-alphabet/) - [Over-Moderation of LGBTQ content on YouTube](https://www.gaystarnews.com/article/youtube-lgbti-content/) - [Disparate Impacts of Moderation](https://www.aclu.org/news/free-speech/time-and-again-social-media-giants-get-content-moderation-wrong-silencing-speech-about-al-aqsa-mosque-is-just-the-latest-example/) - [Calls for Transparency](https://santaclaraprinciples.org/) - [Income Loss from Failures of Moderation](https://foundation.mozilla.org/de/blog/facebook-delivers-a-serious-blow-to-tunisias-music-scene/) - [Fighting Hate Speech, Silencing Drag Queens?](https://link.springer.com/article/10.1007/s12119-020-09790-w) - [Reddit Self Reflection on Lack of Content Policy](https://www.reddit.com/r/announcements/comments/gxas21/upcoming_changes_to_our_content_policy_our_board/) """ def run_article(): st.markdown("## Automatic Content Moderation (ACM)") with st.expander("ACM definition", expanded=False): st.markdown(__ACM_SECTION, unsafe_allow_html=True) st.markdown("## Current approaches to ACM") with st.expander("Current Approaches"): st.markdown(__CURRENT_APPROACHES, unsafe_allow_html=True) st.markdown("## Current challenges in ACM") with st.expander("Current Challenges"): st.markdown(__CURRENT_CHALLENGES, unsafe_allow_html=True) st.markdown("## Examples of ACM in Use: in the Press and in their own Words") col1, col2 = st.columns([4, 5]) with col1.expander("In their own Words"): st.markdown(__SELF_EXAMPLES, unsafe_allow_html=True) with col2.expander("Critical Writings"): st.markdown(__CRITIC_EXAMPLES, unsafe_allow_html=True)