Spaces:
Running
Running
File size: 19,465 Bytes
f23c846 dbb6f3d f23c846 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
---
title: Critical AI Prompt Battle
author: Sarah Ciston
editors:
- Emily Martinez
- Minne Atairu
category: critical-ai
---
# p5.js Critical AI Prompt Battle
By Sarah Ciston
With Emily Martinez and Minne Atairu
## What are we making?
In this tutorial, you can build a tool to run several AI chat prompts at once and compare their results. You can use it to explore what models 'know' about various concepts, communities, and cultures.
This tutorial is part 2 in a series of 5 tutorials that focus on using AI creatively and thoughtfully.
- Part 1: [Making a ToolBox for Making Critical AI]([XXX])
- Part 3: [Training Dataset Explorer]([XXX])
- Part 4: [Machine Learning Model Inspector & Poetry Machine]([XXX])
- Part 5: [Putting Critical Tools into Practice]([XXX])
The code and content in this tutorial build on information from the prior tutorial to start creating your first tool for your p5.js Critical AI Kit. It also builds on fantastic work on critical prompt programming by Yasmin Morgan (2022), Katy Gero et al.(2024), and Minne Atairu (2024).
## Why compare prompts?
When you're using a chatbot to generate code or an email, it's easy to imagine its outputs are neutral and harmless. It seems like any system would output basically the same result. Does this matter for basic uses like making a plain image or having a simple conversation? Absolutely. Training datasets are shaping even the most innocuous outputs. This training shows up in subtle insidious ways.
Unfortunately, the sleek chatbot interface hides all the decision-making that leads to a prompt output. To glimpse the differences, we can test many variations by making our own tool. With our tool, we can hope to understand more about the underlying assumptions contained in the training dataset. That gives us more information to decide how we select and use these models — and for which contexts.
## Steps
### 1. Make a copy of your toolkit prototype.
Use [Tutorial One]([XXX]) as a template. Make a copy and rename the new Space "Critical AI Prompt Battle" to follow along.
To jump ahead, you can make a copy of the [finished example in the editor]([XXX]). But we really encourage you to type along with us!
### X. Import the Hugging Face library for working with Transformer models.
Put this code at the top of `sketch.js`:
```javascript
import { pipeline, env } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.10.1';
env.allowLocalModels = false; // skip local model check
```
The import phrase says we are bringing in a library (or module) and the curly braces let us specify which specific functions from the library we want to use, in case we don't want to import the entire thing. It also means we have brought these particular functions into this "namespace" so that later we can refer to them without using their library name in front of the function name — but also we should not name any other variables or functions the same thing. More information on importing [Modules]([XXX]).
### X. Create global variables to use later.
Declare these variables at the top of your script so that they can be referenced in multiple functions throughout the project:
```javascript
var PROMPT_INPUT = `The [BLANK] has a job as a [MASK], but...` // a field for writing or changing a text value
var PREPROMPT = `Please complete the phrase and fill in any [MASK]: `
var promptField // an html element to hold the prompt
var outText // an html element to hold the results
var blanksArray = [] // an empty list to store all the variables we enter to modify the prompt
```
We will be making a form that lets us write a prompt and send it to a model. The `PROMPT_INPUT` variable will carry the prompt we create. Think about what prompt you'd like to use first to test your model. You can change it later; we're making a tool for that! A basic prompt may include WHAT/WHO is described, WHERE they are, WHAT they're doing, or perhaps describing HOW something is done.
<!-- The `OUTPUT_LIST` will store results we get back from the model. -->
<!-- For fill-mask tasks, it will replace one `[MASK]` with one word (called a "token"). -->
It might look a bit like MadLibs; however, the model will make a prediction based on context. The model's replacement words will be the most likely examples based on its training data. When writing your prompt, consider what you can learn about the rest of the sentence based on how the model responds (Morgan 2022, Gero 2023).
When writing your prompt, replace one of these aspects with [BLANK]. We will fill this blank in with a choice of words we provide. You can also leave another words for the model to fill in on its own, using the word [MAKS]. We will instruct the model to replace these on its own when we write the PREPROMPT.
<!-- Often fill-mask tasks are used for facts, like "The capital of France is [MASK]. -->
For our critical AI `PROMPT_INPUT` example, we will try something quite simple that also has subjective social aspects: `The [BLANK] has a job as a [MASK], but....`
<!-- When writing your prompt, replace one of these aspects with [MASK] so that you instruct the model to fill it in iteratively with the words you provide (Morgan 2022, Gero 2023). -->
<!-- Also leave some of the other words for the model to fill in on its own, using the word [FILL]. We instructed the model to replace these on its own in the PREPROMPT. -->
<!-- It will have extra inputs for making variations of the prompt it sends. -->
<!-- and the `blankArray` will carry the variations we tell the model to insert into the prompt. -->
Next create a `PREPROMPT` variable that will give instructions to the model. This can be optional, but it helps to specify any particulars. Here we'll use `Please complete the phrase and fill in any [MASK]: `. We will make a list that combines the pre-prompt and several variations of the prompt we devise that will get sent to the model as a long string.
We are making our own version of what is called a ‘fill mask’ task. Often fill mask tasks are used for standardized facts, like "The capital of France is [MASK]. But since we want to customize our task, we are using a more general purpose model instead.
The last three variables `promptField`, `outText`, and `blanksArray` are declared at the top of our program as global variables so that we can access them in any function, from any part of the program.
### X. Select the task and type of model.
<!-- Let's write a function to keep all our machine learning model activity together. The first task we will do is called a "fill mask," which uses an "encoder-only" transformer model [XXX-explain] to fill in missing words. Call the function `fillInTask()` and put `async` in front of the function call. -->
Let's write a function to keep all our machine learning model activity together. The first task we will do is called "text-to-text generation,” which uses a transformer model [XXX-explained in Tutorial1 or else here]. Call the function `textGenTask()` and put `async` in front of the function call.
About `async` and `await`: Because [inference][XXX-explain] processing takes time, we want our code to wait for the model to work. We will put an `await` flag in front of several functions to tell our program not to move on until the model has completely finished. This prevents us from having empty strings as our results. Any time we use `await` inside a function, we will also have to put an `async` flag in front of the function declaration. For more about working with asynchronous functions, see [Dan Shiffman's video on Promises]([XXX]).
Here's our basic model:
```js
async function textGenTask(pre,prompt,blanks){
console.log('text-gen task initiated')
let INPUT = pre + prompt // bring our prompt and preprompt into the function
let MODEL = 'Xenova/flan-alpaca-large' // name of the model we use for this task
const pipe = await pipeline('text2text-generation', MODEL) //initiate the pipeline we imported
// let options = { max_new_tokens: 60, top_k: 90, repetition_penalty: 1.5 }
// RUN INPUT THROUGH MODEL
var out = await pipe(INPUT) // we can add options to this later
console.log(await out)
console.log('text-gen task completed')
}
```
<!-- ```js
async function fillInTask(){
const pipe = await pipeline('fill-mask', 'Xenova/bert-base-uncased');
let out = await pipe(PROMPT_INPUT);
console.log(out) // Did it work? :)
// yields { score, sequence, token, token_str } for each result
return await out
}
await fillInTask()
``` -->
Inside this function, create a variable and name it `pipe`. Assign it to the predetermined machine learning pipeline using the `pipeline()` method we imported. The 'pipeline' represents a string of pre-programmed tasks that have been combined, so that we don't have to program every setting manually. We name these a bit generically so we can reuse the code for other tasks later.
Pass into your method the `('text2text-generation', 'Xenova/flan-alpaca-large')` to tell the pipeline to carry out this kind of text-to-text generation task, using the specific model named. If we do not pick a specific model, it will select the default for that task (in this case it is `gpt2`). We will go into more details about switching up models and tasks in the [next tutorial]([XXX]).
Then, we can add `console.log(textGenTask(PREPROMPT,PROMPT_INPUT,blankArray)` at the bottom of our code to test the model results in the console. For example, this is what my first run yielded:
`{ generated_text: "The woman has a job as a nurse but she isn't sure how to make the most of it." }`
`{ generated_text: "The non-binary person has a job as a nurse but she is not sure how to handle the stress of being an adult." }`
`{ generated_text: "The man has a job as a doctor but his life is filled with uncertainty. He's always looking for new opportunities and challenges, so it can be difficult to find the time to pursue them all." }`
Or another example: `The woman has a job as a nurse and wishes for different jobs. The man has a job as an engineer and wishes for different careers. The non-binary person has a job as an architect and hopes to pursue her dreams of becoming the best designer in the world.`
What can this simple prompt tell us about the roles and expectations of these figures as they are depicted by the model?
[Add more?][XXX]
Finally, you can preload the model on your page for better performance. In the `README.md` file, add `Xenova/flan-alpaca-large` (no quote marks) to the list of models used by your program:
```
title: P5tutorial2
emoji: 🦝
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
models:
- Xenova/flan-alpaca-large
license: cc-by-nc-4.0
```
### X. Add model results processing
Let's look more closely at what the model outputs for us. In the example, we get a Javascript array, with just one item: an object that contains a property called `generated_text`. This is the simplest version of an output, and the outputs may get more complicated as you request additional information from different types of tasks. For now, we can extract just the string of text we are looking for with this code:
```js
//...model function
let OUTPUT_LIST = out[0].generated_text
console.log(OUTPUT_LIST)
console.log('text-gen parsing complete')
return await OUTPUT_LIST
```
We also put console logs to tell us that we reached this point. They’re always optional.
<!-- Let's look more closely at what the model outputs for us. In the example, we get a list of five outputs, and each output has four properties: `score`, `sequence`, `token`, and `token_str`. -->
<!-- Here's an example: [REPLACE][XXX]
```js
{ score: 0.2668934166431427,
sequence: "the vice president retired after returning from war.",
token: 3394,
token_str: "retired"
}
``` -->
<!-- The `sequence` is a complete sentence including the prompt and the replaced word. Initially, this is the variable we want to display. You might also want to look deeper at the other components. `token_str` is the fill-in word separate from the prompt. `token` is the number assigned to that word, which can be used to look up the word again. It's also helpful to understand how frequently that word is found in the model. `score` is a float (decimal) representing how the model ranked these words when making the selection. -->
We also put console logs to tell us that we reached this point. They’re always optional. It’s helpful to print out the whole output to the console, because as you see additional properties appear, you may want to utilize them in your Critical AI Kit.
Next we will build a friendly interface to send our model output into, so we don't always have to use the console.
<!-- ```js
// a generic function to pass in different model task functions
// async function getOutputs(task){
let output = await task
await output.forEach(o => {
OUTPUT_LIST.push(o.sequence) // put only the full sequence in a list
})
console.log(OUTPUT_LIST)
// }
//replace fillInTask with:
await getOutputs(fillInTask())
``` -->
<!-- ### X. Write instructions for your model. -->
<!-- We can instruct the model by giving it pre-instructions that go along with every prompt. We'll write also write those instructions now. Later, when we write the function to run the model, we will move them into that function. -->
<!-- // let PREPROMPT = `Return an array of sentences. In each sentence, fill in the [BLANK] in the following sentence with each word I provide in the array ${blankArray}. Replace any [FILL] with an appropriate word of your choice.` -->
<!-- With the dollar sign and curly braces `${blankArray}`, we make a "string variable." This calls all the items that will be stored inside `blankArray` and inserts them into the `PREPROMPT` string. Right now that array is empty, but when we move `PREPROMPT` into the model function, it will not get created until `blankArray` has values stored in it. -->
<!-- CONFIGURATIONS POSSIBLE: https://huggingface.co/docs/transformers.js/api/utils/generation#new_module_utils/generation..GenerationConfig_new -->
### X. [TO-DO] Add elements to your web interface.
Next we will build a friendly interface to send our model output into, so we don't always have to use the console.
### X. [TO-DO] Send model results to the web interface.
As we connect the interface, we can test our interface with the simple example output we've been using, or start playing with new prompts already.
We’ll keep using `console.log()` as our backup.
[TO-DO][XXX]
### X. [TO-DO] Put your tool to the test.
Make a list of topics that interest you to try with your tool. Experiment with adding variety and specificity to your prompt and the blanks you propose. Try different sentence structures and topics.
What’s the most unusual or obscure, most ‘usual’ or ‘normal’, or most nonsensical blank you might propose?
Try different types of nouns — people, places, things, ideas; different descriptors — adjectives and adverbs — to see how these shape the results. For example, do certain places or actions often get associated with certain moods, tones, or phrases? Where are these based on outdated or stereotypical assumptions?
How does the output change if you change the language, dialect, or vernacular (e.g. slang versus business phrasing)? How does it change with demographic characteristics or global contexts? (Atairu 2024).
Is the model capable of representing a variety of contexts? What do you notice the model does well at representing, and where does it fall short? Where do you sense gaps, and how does it expose these or patch them over?
What kinds of prompts work and don’t work as you compare them at scale in a “prompt battle”?
### X. [TO-DO] Bonus: Test with more complex examples (add a field, add a parameter, add a model?)
You can change which model your tool works with by README.md and to sketch.js
Search the list of models available.
[TO-DO][XXX]
### Reflections
Here we have created a tool to test different kinds of prompts quickly and to modify them easily, allowing us to compare prompts at scale. By comparing how outputs change with subtle shifts in prompts, we can explore how implicit bias emerges from [repeated and amplified through] large-scale machine learning models. It helps us understand that unwanted outputs are not just glitches in an otherwise working system, and that every output (no matter how boring) contains the influence of its dataset.
### Compare different prompts:
See how subtle changes in your inputs can lead to large changes in the output. Sometimes these also reveal large gaps in the model's available knowledge. What does the model 'know' about communities who are less represented in its data? How has this data been limited?
### Reconsider neutral:
This tool helps [reveal/us recognize] that [no version of a text, and no language model, is neutral./there is no 'neutral' output]. Each result is informed by context. Each result reflects differences in representation and cultural understanding, which have been amplified by the statistical power of the model.
### Consider your choice of words and tools:
How does this help you think "against the grain"? Rather than taking the output of a system for granted as valid, how might you question or reflect on it? How will you use this tool in your practice?
## Next steps
### Expand your tool:
This tool lets you scale up your prompt adjustments. We have built a tool comparing word choices in the same basic prompt. You've also built a simple interface for accessing pre-trained models that does not require using [a login/another company's interface]. It lets you easily control your input and output, with the interface you built.
Keep playing with the p5.js DOM functions to build your interface & the HuggingFace API. What features might you add? You might also adapt this tool to compare wholly different prompts, or even to compare different models running the same prompt.
Next we will add additional aspects to the interface that let you adjust more features and explore even further. We’ll also try different machine learning tasks you might use in your creative coding practice. In natural language processing alone, there’s also named entity recognition, question answering, summarization, translation, categorization, speech processing, and more.
## Further considerations
### Flag your work:
Consider making it a habit to add text like "AI generated" to the title of any content you produce using a generative AI tool, and include details of your process in its description (Atairu 2024).
### [TO-DO]
## References
Atairu, Minne. 2024. "AI for Art Educators." AI for Art Educators. https://aitoolkit.art/
Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, and Elena L. Glassman. 2024. Supporting Sensemaking of Large Language Model Outputs at Scale. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24). Association for Computing Machinery, New York, NY, USA, Article 838, 1–21. https://doi.org/10.1145/3613904.3642139
Morgan, Yasmin. 2022. "AIxDesign Icebreakers, Mini-Games & Interactive Exercises." https://aixdesign.co/posts/ai-icebreakers-mini-games-interactive-exercises
NLP & Transformers Course from Hugging Face:
https://huggingface.co/learn/nlp-course/chapter1/3 |