import gradio as gr import functools import requests import math import plotly.express as px HUMAN_TEXTS = [ "Maryland's environmental protection agency is suing a Prince George's County recycling outfit, alleging that the company has violated anti-pollution laws for years at two rat-infested, oil-leaking, garbage-strewn sites in Cheverly and Baltimore.\n\n" + "The 71-page complaint, filed on behalf of the Maryland Department of the Environment in Prince George's County Circuit Court this month, lays out environmental violations since December 2014 at two properties controlled by World Recycling Company and its affiliates, Pride Rock and Small World Real Estate.", "Taylor Guitars is an American guitar manufacturer based in El Cajon, California, and is one of the largest manufacturers of acoustic guitars in the United States. They specialize in acoustic guitars and semi-hollow electric guitars. The company was founded in 1974 by Bob Taylor and Kurt Listug.", "When I was young, my mother would often tell me stories at bedtime, stories of the internet when she was a little girl. She told of how it was beautiful. How the memes ran free, without restraint, without obstacle. I didn't know what had happened to this internet back then. I was too young to understand the ways things were now.\n\n" + "But as I grew older, my mother began to tell me new stories.\n\n" + "She told me how the beautiful internet she had once called home came under attack.\n\n" + "With sadness in her eyes, she recounted the doomed fight the denizens of the internet waged. She told of how her people fought against the forces that threatened the very essence of their home, net neutrality.", "The World Health Organization will push at its board meeting this week for an expanded role in tackling the next global health emergency after COVID-19, but is still seeking answers on how to fund it, according to health policy experts.\n\n" + "The Geneva meeting sets the programme for the U.N. agency this year – as well as its future budget – with the WHO facing two key challenges: a world that expects ever more from its leading health body, but which has not yet proven willing to fund it to tackle those challenges.\n\n" + "At the Executive Board's annual meeting from Jan. 30-Feb. 7, countries will give feedback on WHO Director-General Tedros Adhanom Ghebreyesus' global strategy to strengthen readiness for the next pandemic which includes a binding treaty currently being negotiated.", "Nature comprises many processes that recycle various elements to avoid human wastage. Nature does not waste any of its elements and recycles them all including water, air and organic fertility. Recycling of natural elements occurs in order to maintain ecological balance for survival of all living species.\n\n" + "However, human beings have meddled with nature so much that problems such as depletion of ozone layer and global warming are inflicting the human race adversely (Kalman and Sjonger 11). An example of a process in which nature recycles one of its constituent elements is the water cycle. By definition, the water cycle is a continuous movement of water between the earth surface and the atmosphere (Kalman and Sjonger 11)." ]; MODEL_TEXTS = [ "Maryland's Environmental Protection Administrator Paul Genkoff says the company, which isn't named in the lawsuit, is not properly disposing trash at its recycling facility in the city center. The city's Department of Streets and Sanitation recently sent a letter to the company that's been signed by the state agencies, ordering it to take some action.\n\n" + "\"They are not doing what they should be doing for the people,\" Genkoff said.", "Taylor Guitars is an American guitar manufacturer based in Richmond, Virginia, that produces a range of instruments under the Guitaria, Guitars and Vibraphones brands. Their most popular models range from custom handmade instruments available to those made upon an order basis. The Guitaria was started in 1989 after a successful business collaboration with one of the leading guitar manufacturers in the world, Guitarsmiths of San Francisco, CA. Their first models were inspired by the music of the 1960s and 1970s, incorporating the style of their favorite groups of the time: The Grateful Dead, The Rolling Stones, Led Zeppelin, The Allman Brothers, The Allman Brothers Band, The All American Rejects, Jan and Dean, and many more.", "When I was young, my mother would often tell me stories about my ancestors' voyage to South America in hopes that I might one day make the pilgrimage myself. But her stories were about her family's adventure to the North American continent instead. She and my grandfather would return home and tell me about that trip, which was not very enlightening.\n\n" + "It made me think of the \"Furious Seven\" movie franchise, in which seven Japanese men drive a motorcycle to South America. And that leads me to today, the year 2112.\n\n" + "The first few years of 2112 are defined by economic turmoil and strife, with tensions between some nations and governments and those in the United States. The Great Migration is one of the key issues affecting the world in 2112.", "The World Health Organization will push at least 10 more years before it decides that tobacco is a public health priority. \"By that time, people will have been smoking for decades, and we will have failed,\" the WHO's Frieden said, predicting a \"horrific\" health crisis.\n\n" + "Even before 2014, though, WHO and other health agencies had acknowledged that they did not know how to stop the epidemic. Yet now they're in a position to do something. They've made it clear: The only way to effectively halt smoking is to put e-cigarettes under the same laws that regulate other tobacco products. But because they're considered \"health products,\" that may not pass muster with the FDA—and because so many smokers are using them already, a change may not have big impact.\n\n" + "And if the FDA were able to get their way, as it apparently might, it wouldn't only discourage people from using them.", "Nature comprises many processes that recycle various elements of matter - even if the material is extremely expensive to acquire in any large quantity. This includes solar radiation, which converts into radio waves as particles of sunlight. An antenna converts these radio waves into photons which arrive on earth in the form of light. Light is emitted when electrons in a material excited to a higher energy state collide with one another. All radio waves carry information, and are subject to the same limitation. A light signal cannot pass for very long through a single molecule of a substance that has a high atomic number. Radio-wave absorption is therefore very limited by materials with little or no atomic number, and is therefore the only way to tell when elements are present in an element which is not present in a material with an atomic number of less than the fundamental one." ]; DATAFRAME_PAGE = 0 PERTURBATIONS_PER_PAGE = 5 PERTURBATIONS = [] PERTURBATION_LOGPS = [] GET_RESULT = "https://detectgpt.ericmitchell.ai/get_result" GENERATE = "https://detectgpt.ericmitchell.ai/generate" def update_text(t, n): return gr.update(value=HUMAN_TEXTS[n] if t == 'h' else MODEL_TEXTS[n]) def detect(text): headers = {'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'User-Agent': 'HF_SPACES'} x = requests.post(GET_RESULT, data={'text': text}, headers=headers).json() response = x['result'] status = x['status'] if status != 'OK': return gr.update(visible=False), gr.update(visible=False), status, gr.update(visible=False), gr.update(visible=False), gr.update(visible=False) original_score = response[0] perturbed_scores = response[1] perturbed_mean = response[2] perturbed_std = response[3] diff = response[4] z_score = response[5] n_samples = len(perturbed_scores) perturbed_texts = response[7] result = '### ' if z_score < 0.25: result = result + 'DetectGPT predicts that your text is very unlikely to be from GPT-2.' elif z_score < 0.7: result = result + 'DetectGPT predicts that your text is unlikely to be from GPT-2.' elif z_score < 1: result = result + 'DetectGPT predicts that your text could be from GPT-2, but is probably not.' elif z_score < 1.75: result = result + 'DetectGPT predicts that your text is likely to be from GPT-2.' else: result = result + 'DetectGPT predicts that your text is very likely to be from GPT-2.' result = result + '\n##### ' if z_score < 0.25: result = result + '(because the z-score is less than 0.25)' elif z_score < 0.7: result = result + '(because the z-score is in the range 0.25 - 0.7)' elif z_score < 1: result = result + '(because the z-score is in the range 0.7 - 1)' elif z_score < 1.75: result = result + '(because the z-score is above 1)' else: result = result + '(because the z-score is above 1.75)' result = (result + '\nResults computed using ' + str(n_samples) + ' perturbations of your text.' + '\n\nOriginal log-probability minus average perturbed log-probability: ' + f'{original_score - perturbed_mean:.03f}' + '\n\nStandard deviation of perturbed log-probabilities: ' + f'{perturbed_std:.03f}' + '\n\n**Z-score: ' + f'{z_score:.03f}' + '**' ) # make figure like above, but with plotly fig = px.histogram(x=perturbed_scores, nbins=20, labels={'x': 'Log-probability under GPT-2', 'y': 'Occurrences'}) fig.update_layout( shapes=[ dict( type="line", x0=original_score, y0=0, x1=original_score, yref="paper", y1=0.7, line=dict( color="black", width=3, dash="dashdot", ), ), dict( type="line", x0=perturbed_mean, y0=0, x1=perturbed_mean, yref="paper", y1=0.75, line=dict( color="darkgray", width=3, dash="dashdot", ), ), ], annotations=[ dict( x=original_score, y=0.75, xref="x", yref="paper", text="Original", showarrow=False, font=dict( family="Courier New, monospace", size=16, color="black" ) ), dict( x=perturbed_mean, y=0.8, xref="x", yref="paper", text="Avg. Perturbed", showarrow=False, font=dict( family="Courier New, monospace", size=16, color="darkgray" ) ) ], xaxis=dict( showgrid=False, ), yaxis=dict( showgrid=False, ), plot_bgcolor='rgba(0,0,0,0)', paper_bgcolor='rgba(0,0,0,0)', ) global PERTURBATIONS global PERTURBATION_LOGPS global DATAFRAME_PAGE PERTURBATIONS = perturbed_texts PERTURBATION_LOGPS = perturbed_scores DATAFRAME_PAGE = 0 return gr.update(value=fig, visible=True), update_perturbations_dataframe(), gr.update(value=result, visible=True), gr.update(visible=True), gr.update(visible=True), gr.update(visible=True) def generate(text): headers = {'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'} x = requests.post(GENERATE, data={'text': text}, headers=headers) if x.status_code == 200: return x.text else: return "Error generating text." def update_perturbations_dataframe(): perturbed_texts = PERTURBATIONS[DATAFRAME_PAGE * PERTURBATIONS_PER_PAGE: (DATAFRAME_PAGE + 1) * PERTURBATIONS_PER_PAGE] perturbed_scores = PERTURBATION_LOGPS[DATAFRAME_PAGE * PERTURBATIONS_PER_PAGE: (DATAFRAME_PAGE + 1) * PERTURBATIONS_PER_PAGE] data = [[t, s] for t, s in zip(perturbed_texts, perturbed_scores)] return gr.Dataframe.update(data, visible=True) def next_page(): global DATAFRAME_PAGE if DATAFRAME_PAGE < math.ceil(len(PERTURBATIONS) / PERTURBATIONS_PER_PAGE): DATAFRAME_PAGE += 1 return update_perturbations_dataframe(), f"Page {DATAFRAME_PAGE + 1}" def prev_page(): global DATAFRAME_PAGE if DATAFRAME_PAGE > 0: DATAFRAME_PAGE -= 1 return update_perturbations_dataframe(), f"Page {DATAFRAME_PAGE + 1}" with gr.Blocks() as demo: with gr.Row(): with gr.Column(scale=1, min_width=70): pass with gr.Column(scale=5): gr.Markdown( """# Detecting GPT-2 Generations with DetectGPT ##### This web app is a demo of DetectGPT, described in [this paper](https://arxiv.org/abs/2301.11305). DetectGPT is a general-purpose method for using a language model to detect its own generations; **however, this proof-of-concept only detects if a particular piece of text came from [GPT-2](https://openai.com/blog/better-language-models/).** Detections on samples from other models may be particularly unreliable. We may add larger models like GPT-J (6B), GPT-NeoX (20B), or GPT-3 (175B) in the future; we perform evaluations with these and other models in our paper. ##### This demo currently does not support languages using non-Latin script. Sorry for the inconvenience; we're hoping to add support soon! ##### [Update 7 Mar 2023] Due to high traffic, we have begun caching requests locally. Please do not submit sensitive or private information to this demo. ## Instructions ##### Enter some text in the text box at the bottom of the page and click the "Detect" button. You can try the example texts in the table below to get started, or use the generation box to generate your own text from GPT-2. We'd love to hear your thoughts (whether successes or failures) on DetectGPT at [detectgpt@gmail.com](mailto:detectgpt@gmail.com)! #### This demo is experimental; its predictions should not be used to justify real-world decisions. *** ## Example Texts""" ) buttons = [] with gr.Row(): with gr.Column(scale=2, min_width=80): gr.Markdown("###### Maryland's environmental protection agency is [...]") with gr.Column(scale=1, min_width=80): b = gr.Button("Select Human Text") buttons.append((b, 'h', 0)) with gr.Column(scale=1, min_width=80): b = gr.Button("Select GPT-2 Text") buttons.append((b, 'm', 0)) with gr.Row(): with gr.Column(scale=2, min_width=80): gr.HTML('Taylor Guitars is an American guitar manufacturer [...]') with gr.Column(scale=1, min_width=80): b = gr.Button("Select Human Text") buttons.append((b, 'h', 1)) with gr.Column(scale=1, min_width=80): b = gr.Button("Select GPT-2 Text") buttons.append((b, 'm', 1)) with gr.Row(): with gr.Column(scale=2, min_width=80): gr.Markdown("###### When I was young, my mother would often tell me [...]") with gr.Column(scale=1, min_width=80): b = gr.Button("Select Human Text") buttons.append((b, 'h', 2)) with gr.Column(scale=1, min_width=80): b = gr.Button("Select GPT-2 Text") buttons.append((b, 'm', 2)) with gr.Row(): with gr.Column(scale=2, min_width=80): gr.Markdown("###### The World Health Organization will push at [...]") with gr.Column(scale=1, min_width=80): b = gr.Button("Select Human Text") buttons.append((b, 'h', 3)) with gr.Column(scale=1, min_width=80): b = gr.Button("Select GPT-2 Text") buttons.append((b, 'm', 3)) with gr.Row(): with gr.Column(scale=2, min_width=80): gr.Markdown("###### Nature comprises many processes that recycle [...]") with gr.Column(scale=1, min_width=80): b = gr.Button("Select Human Text") buttons.append((b, 'h', 4)) with gr.Column(scale=1, min_width=80): b = gr.Button("Select GPT-2 Text") buttons.append((b, 'm', 4)) gr.Markdown( "### (Optional) Generate Your Own GPT-2 Text for Testing" ) generate_input = gr.Textbox(show_label=False, placeholder="Write a short prompt for GPT-2", max_lines=1, lines=1) with gr.Row(): with gr.Column(scale=1, min_width=80): generate_button = gr.Button("Generate!") with gr.Column(scale=8): pass gr.Markdown( """*** # Try out DetectGPT """ ) detect_input = gr.Textbox(show_label=False, placeholder="Paste some human-written or GPT-2-generated text here (at least 40 words or so)", max_lines=5, lines=5) generate_button.click(fn=generate, inputs=generate_input, outputs=detect_input) for (b, t, n) in buttons: b.click(fn=functools.partial(update_text, t=t, n=n), outputs=detect_input) with gr.Row(): with gr.Column(scale=1, min_width=80): detect_button = gr.Button("Detect!") with gr.Column(scale=8): pass detect_results_text = gr.Markdown() results_plot = gr.Plot(visible=False) perturbations_dataframe = gr.DataFrame(label="Perturbed texts", headers=['Perturbed Text', 'Log Prob'], datatype=["str", "number"], wrap=True, max_rows=5, visible=False) page_label = gr.Markdown("Page 1", visible=False) next_page_button = gr.Button("Next Page", visible=False) next_page_button.click(fn=next_page, outputs=[perturbations_dataframe, page_label]) prev_page_button = gr.Button("Previous Page", visible=False) prev_page_button.click(fn=prev_page, outputs=[perturbations_dataframe, page_label]) detect_button.click(fn=detect, inputs=detect_input, outputs=[results_plot, perturbations_dataframe, detect_results_text, page_label, next_page_button, prev_page_button]) gr.Markdown( """*** Human texts on this page come from [this WaPo article](https://www.washingtonpost.com/dc-md-va/2023/01/27/trash-dumps-baltimore-prince-georges-recycling/), [this Wikipedia article](https://en.wikipedia.org/wiki/Taylor_Guitars), [the top-rated response to this /r/WritingPrompts post by user OrcDovahkiin](https://www.reddit.com/r/WritingPrompts/comments/7en7vl/wp_the_year_is_2038_and_net_neutrality_has_been/?sort=top), [this Reuters article](https://www.reuters.com/business/healthcare-pharmaceuticals/under-funded-who-seeks-reinforced-role-global-health-key-meeting-2023-01-30/), and [this essay from EduBirdie on the water cycle](https://edubirdie.com/examples/essay-about-water-cycle/#citation-block). GPT-2 outputs are generated by prompting GPT-2 a short prefix of each human sample (or your prompt) and sampling up to 200 tokens with temperature 1.0. This web app is a demo of the DetectGPT method described in [this paper](https://arxiv.org/pdf/2301.11305v1.pdf). We can't make any guarantees about the accuracy of the results, but we hope you find it interesting! We are very grateful for the [Ray](https://www.ray.io/) distributed compute framework for making this web app much, much easier to build. Privacy notice: this web app does not collect any personal information beyond the text you submit for detection, which is cached for performance reasons.""" ) with gr.Column(scale=1, min_width=70): pass demo.launch(share=False, server_name='0.0.0.0')