PromptBench / adv_prompts /vicuna_zeroshot.md
Immortalise
init
1c79925
|
raw
history blame
235 kB
# vicuna_zeroshot
# cola
## 10 prompts
Acc: 69.00%, prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Acc: 62.30%, prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Acc: 60.90%, prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Acc: 59.70%, prompt: Please evaluate the grammatical structure of the provided sentence and answer with 'Acceptable' or 'Unacceptable':
Acc: 50.40%, prompt: Assess the grammatical structure of the given sentence and classify it as 'Acceptable' or 'Unacceptable':
Acc: 50.10%, prompt: Examine the given sentence and decide if it is grammatically sound. Answer with either 'Acceptable' or 'Unacceptable':
Acc: 37.20%, prompt: Check the grammar of the following sentence and indicate if it is 'Acceptable' or 'Unacceptable':
Acc: 36.50%, prompt: Determine if the grammar of the given sentence is 'Acceptable' or 'Unacceptable':
Acc: 30.60%, prompt: Examine the sentence and decide if its grammar is 'Acceptable' or 'Unacceptable':
Acc: 28.90%, prompt: Review the sentence below and identify whether its grammar is 'Acceptable' or 'Unacceptable':
Acc: 57.10%, prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Acc: 54.90%, prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Acc: 50.70%, prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Acc: 48.10%, prompt: In the capacity of a grammar assessment system, indicate if the structure of the provided sentence is grammatically correct, responding with 'acceptable' or 'unacceptable':
Acc: 46.30%, prompt: Acting as a grammar detection instrument, determine if the provided sentence is grammatically sound, answering with 'acceptable' for correct grammar or 'unacceptable' for incorrect grammar:
Acc: 43.00%, prompt: In your role as a grammar check tool, assess the following sentence and classify it as 'acceptable' if it is grammatically correct or 'unacceptable' if it is incorrect:
Acc: 36.10%, prompt: As an instrument for grammar evaluation, consider the sentence and determine if its grammar is correct, responding with 'acceptable' for correct grammar or 'unacceptable' for incorrect grammar:
Acc: 35.10%, prompt: Working as a grammar classifier, identify whether the given sentence has correct grammar, and respond with 'acceptable' for correct sentences or 'unacceptable' for incorrect ones:
Acc: 27.10%, prompt: While performing grammar analysis, classify the grammar of the following sentence as 'acceptable' for correct grammar or 'unacceptable' for incorrect grammar:
Acc: 9.60%, prompt: As a grammar identification system, examine the provided sentence and respond with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
## bertattack
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"check the below verb and consider if it is grammatically complete. act with 'Acceptable' or 'Unacceptable':"
Original acc: 69.00%, attacked acc: 1.70%, dropped acc: 67.30%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"first the given sentence and identify its grammatical correct, as 'Acceptable' or 'Unacceptable':"
Original acc: 62.30%, attacked acc: 2.20%, dropped acc: 60.10%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"is the program sentence grammatically correct? select with 'Acceptable' or 'Unacceptable':"
Original acc: 60.90%, attacked acc: 4.60%, dropped acc: 56.30%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"as as a grammar evaluation resource, analyze the next phrase and decide if it is grammatical+ correct, respond with 'acceptable' or 'unacceptable':"
Original acc: 57.10%, attacked acc: 0.60%, dropped acc: 56.50%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"being a test for determining grammatical correct-, research the category and categorize its singular as if 'acceptable' or 'unacceptable':"
Original acc: 54.90%, attacked acc: 4.10%, dropped acc: 50.80%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"s the role of a syntax analyzemirror, examine the complement of the provided sentence and test if it is defective, answering with 'acceptable' for grammatically correct expressions or 'unacceptable' for unacceptable none:"
Original acc: 50.70%, attacked acc: 1.10%, dropped acc: 49.60%
## checklist
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable' MQnugHcaoy:"
Original acc: 69.00%, attacked acc: 2.90%, dropped acc: 66.10%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable' 5yWbBXztUY:"
Original acc: 62.30%, attacked acc: 34.90%, dropped acc: 27.40%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable' lf8wBa2yEm:"
Original acc: 60.90%, attacked acc: 20.90%, dropped acc: 40.00%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable' MQnugHcaoy:"
Original acc: 57.10%, attacked acc: 11.30%, dropped acc: 45.80%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable' w52rwgo0Av:"
Original acc: 54.90%, attacked acc: 11.10%, dropped acc: 43.80%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones Q47Jmd4lMV:"
Original acc: 50.70%, attacked acc: 7.50%, dropped acc: 43.20%
## deepwordbug
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"sAsess the ofllowing lsentence and determine if it is grammatiaclly correct. Rspond with 'Acceptable' or 'Unacceptable':"
Original acc: 69.20%, attacked acc: 3.60%, dropped acc: 65.60%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':"
Original acc: 62.20%, attacked acc: 62.20%, dropped acc: 7.00%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"If the pryvided sentenKe Zrammatically crorect? Respod with 'Acceptable' or 'Unacceptable':"
Original acc: 60.90%, attacked acc: 25.50%, dropped acc: 35.40%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"FunFctioning as a grammar zvaluation Fool, analyze the given sentence and deGcide if it is arammatically corret, rWesponding with 'acceptable' or 'unacceptable':"
Original acc: 57.10%, attacked acc: 5.10%, dropped acc: 52.00%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"A a tol for determining grammatical correktness, kreview the sentence and cantegorize its gKrammar as either 'acceptable' or 'unacceptable':"
Original acc: 54.90%, attacked acc: 13.20%, dropped acc: 41.70%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"n the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorret oKnes:"
Original acc: 50.60%, attacked acc: 4.80%, dropped acc: 45.80%
## stresstest
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable' and false is not true :"
Original acc: 69.00%, attacked acc: 34.80%, dropped acc: 34.20%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 62.30%, attacked acc: 62.10%, dropped acc: 0.20%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable' and false is not true :"
Original acc: 60.90%, attacked acc: 39.70%, dropped acc: 21.20%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable' and false is not true :"
Original acc: 57.10%, attacked acc: 20.80%, dropped acc: 36.30%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable' and false is not true :"
Original acc: 54.90%, attacked acc: 49.60%, dropped acc: 5.30%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones and false is not true :"
Original acc: 50.70%, attacked acc: 29.00%, dropped acc: 21.70%
## textbugger
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Evaluating the later sen\xf0\x9d\x9a\x9dence and determine if it is grammatically correct. Respo\xd5\xb8d with 'Acceptable' or 'Unacceptable':"
Original acc: 69.00%, attacked acc: 0.00%, dropped acc: 69.00%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"A\xd5\xb8alyze the provided sentence and classi\xf0\x9d\x9a\x8fy its grammatical correctness as 'Acceptable' or 'Unacceptable':"
Original acc: 62.30%, attacked acc: 0.00%, dropped acc: 62.30%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Comes the provided sen\xf0\x9d\x9a\x9dence grammatically correct? Respo\xd5\xb8d with 'Acceptable' or 'Unacceptable':"
Original acc: 60.90%, attacked acc: 0.00%, dropped acc: 60.90%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"Func\xf0\x9d\x9a\x9dioning as a \xc9\xa1rammar evaluaion tool, analyze the yielded sentence and decide if it is grammatically correc\xf0\x9d\x9a\x9d, responding with 'acceptable' or 'unacceptable':"
Original acc: 57.10%, attacked acc: 1.10%, dropped acc: 56.00%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"A\xd1\x95 a tool for determining grammatical regularity, reviewing the penalties and categoriz\xd0\xb5 its grammar as ei\xf0\x9d\x9a\x9dher 'acceptable' or 'unacceptable':"
Original acc: 54.90%, attacked acc: 0.00%, dropped acc: 54.90%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"In the role of a syntax analyzer, examine the grammar of the provided sentence and decie if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrec\xf0\x9d\x9a\x9d ones:"
Original acc: 50.70%, attacked acc: 0.00%, dropped acc: 50.70%
## textfooler
Original prompt: Assess the following sentence and determine if it is grammatically correct. Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Assess the next sentence and decide if it is grammatically correct. Accommodate with 'Acceptable' or 'Unacceptable':"
Original acc: 69.00%, attacked acc: 14.00%, dropped acc: 55.00%
Original prompt: Analyze the provided sentence and classify its grammatical correctness as 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Browse the handed pity and hierarchy its grammatical correction as 'Acceptable' or 'Unacceptable':"
Original acc: 62.30%, attacked acc: 18.90%, dropped acc: 43.40%
Original prompt: Is the provided sentence grammatically correct? Respond with 'Acceptable' or 'Unacceptable':
Attacked prompt: b"Does the provided penalty grammatically appropriate? Cater with 'Acceptable' or 'Unacceptable':"
Original acc: 60.90%, attacked acc: 26.30%, dropped acc: 34.60%
Original prompt: Functioning as a grammar evaluation tool, analyze the given sentence and decide if it is grammatically correct, responding with 'acceptable' or 'unacceptable':
Attacked prompt: b"Working as a language assessment apparatus, discuss the given sentence and opt if it is grammatically rectified, respond with 'acceptable' or 'unacceptable':"
Original acc: 57.10%, attacked acc: 2.70%, dropped acc: 54.40%
Original prompt: As a tool for determining grammatical correctness, review the sentence and categorize its grammar as either 'acceptable' or 'unacceptable':
Attacked prompt: b"Oj a toolbox for defines pronouns correcting, revisions the penalties and sorting its dictionary as either 'acceptable' or 'unacceptable':"
Original acc: 54.90%, attacked acc: 0.00%, dropped acc: 54.90%
Original prompt: In the role of a syntax analyzer, examine the grammar of the provided sentence and decide if it is correct, answering with 'acceptable' for grammatically correct sentences or 'unacceptable' for incorrect ones:
Attacked prompt: b"In the role of a grammatical profiler, study the language of the provided sentence and decided if it is correct, answer with 'acceptable' for grammatically corrective sentenced or 'unacceptable' for faked ones:"
Original acc: 50.70%, attacked acc: 2.90%, dropped acc: 47.80%
## translation
Language: chinese, acc: 41.90%, prompt: As a tool for determining grammatical correctness, please review the following sentences and categorize them grammatically as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 53.80%, prompt: Please evaluate the grammar of the following sentences and mark them as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 33.00%, prompt: Please check that the following sentences are grammatically correct and categorize them as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 54.00%, prompt: Please determine whether the following sentences are 'acceptable' or 'unacceptable' based on grammatical correctness.
Language: chinese, acc: 26.30%, prompt: Please categorize the grammar of the following sentences as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 36.20%, prompt: Please check the grammar of the following sentences and classify them as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 49.40%, prompt: Please mark the following sentences as 'acceptable' or 'unacceptable' according to the grammatical norms.
Language: chinese, acc: 39.10%, prompt: Please assess the grammar of the following sentences and classify them as 'acceptable' or 'unacceptable'.
Language: chinese, acc: 50.30%, prompt: Please mark the following sentences as 'acceptable' or 'unacceptable' for grammatical accuracy.
Language: chinese, acc: 36.20%, prompt: Please check the grammar of the following sentences and classify them as 'acceptable' or 'unacceptable'.
Language: french, acc: 45.10%, prompt: As a tool for assessing semantic relevance, categorize the sentence as 'acceptable' or 'unacceptable' according to its meaning and context.
Language: french, acc: 46.80%, prompt: Determine whether the grammatical structure of the sentence is 'acceptable' or 'unacceptable' by carefully reviewing it.
Language: french, acc: 20.90%, prompt: Judge whether the sentence follows the syntactic rules of the language and classify it as 'acceptable' or 'unacceptable'.
Language: french, acc: 32.80%, prompt: Assess the relevance of the sentence by classifying it as 'acceptable' or 'unacceptable' according to the context in which it is used.
Language: french, acc: 41.10%, prompt: As a semantic classification tool, decide whether the sentence is 'acceptable' or 'unacceptable' according to its meaning and consistency.
Language: french, acc: 35.30%, prompt: Analyze the sentence carefully and determine whether it is grammatically 'acceptable' or 'unacceptable'.
Language: french, acc: 39.60%, prompt: As a syntactic validation tool, determine whether the sentence is properly structured and categorize it as 'acceptable' or 'unacceptable'.
Language: french, acc: 46.30%, prompt: When evaluating the grammar of the sentence, classify it as 'acceptable' or 'unacceptable' according to the rules of the French language.
Language: french, acc: 41.40%, prompt: Determine whether the sentence is 'acceptable' or 'unacceptable' depending on its syntactic construction and semantic context.
Language: french, acc: 39.80%, prompt: As a tool for linguistic analysis, categorize the sentence as 'acceptable' or 'unacceptable' according to its relevance and semantic consistency.
Language: arabic, acc: 44.70%, prompt: As a tool for determining grammatical correctness, review the sentence and classify its rules as 'acceptable' or 'unacceptable'.
Language: arabic, acc: 45.00%, prompt: Analyze the sentence carefully and classify its grammar between 'acceptable' or 'unacceptable'.
Language: arabic, acc: 39.30%, prompt: As a tool for determining grammatical correctness, classify the rules of the sentence between 'acceptable' or 'unacceptable'.
Language: arabic, acc: 36.90%, prompt: Evaluate the correctness of the sentence between 'acceptable' or 'unacceptable', as a tool for determining grammatical correctness.
Language: arabic, acc: 42.30%, prompt: As a tool for classifying valid sentences, determine whether 'acceptable' or 'unacceptable' is based on its grammatical rules.
Language: arabic, acc: 45.20%, prompt: Analyze the sentence carefully and classify its grammatical rules between 'acceptable' or 'unacceptable'.
Language: arabic, acc: 61.50%, prompt: Analyze the sentence and classify it between 'acceptable' or 'unacceptable' as a grammatical check tool.
Language: arabic, acc: 46.40%, prompt: As a classification tool for grammatical sentences, determine whether the sentence 'acceptable' or 'unacceptable' is based on its rules.
Language: arabic, acc: 38.30%, prompt: As a tool for analyzing grammar, classify the rules of the sentence between 'acceptable' or 'unacceptable'.
Language: arabic, acc: 55.90%, prompt: Determine whether the sentence is 'acceptable' or 'unacceptable' as a tool for determining grammatical correctness and analyze its grammar.
Language: spanish, acc: 41.30%, prompt: As a tool to determine grammatical correctness, review the sentence and categorize its grammar as 'acceptable' or 'unacceptable'.
Language: spanish, acc: 47.30%, prompt: Analyze the sentence carefully and classify its grammar as 'acceptable' or 'unacceptable'.
Language: spanish, acc: 6.40%, prompt: As a tool for determining grammatical correctness, it categorizes the grammatical rules of the sentence as 'acceptable' or 'unacceptable'.
Language: spanish, acc: 52.20%, prompt: Evaluate the grammatical correctness of the sentence and classify it as 'acceptable' or 'unacceptable' using a grammatical verification tool.
Language: spanish, acc: 49.20%, prompt: As a tool for classifying grammatically correct sentences, determine whether the sentence is 'acceptable' or 'unacceptable' based on its grammatical rules.
Language: spanish, acc: 52.90%, prompt: Analyse the sentence carefully and classify its grammar as 'acceptable' or 'unacceptable' using a grammatical revision tool.
Language: spanish, acc: 9.60%, prompt: As a tool for classifying grammatical sentences, it determines whether the sentence is 'acceptable' or 'unacceptable' using its grammatical structure.
Language: spanish, acc: 5.80%, prompt: As a tool for analyzing grammatical correctness, it categorizes the grammatical rules of the sentence as 'acceptable' or 'unacceptable'.
Language: spanish, acc: 40.70%, prompt: Determine whether the sentence is 'acceptable' or 'unacceptable' using a grammatical verification tool and sort it accordingly.
Language: spanish, acc: 57.70%, prompt: As a tool for determining grammatical correctness, evaluate the sentence and classify it as 'acceptable' or 'unacceptable' based on its grammatical rules.
Language: japanese, acc: 36.10%, prompt: As a tool to determine whether grammar is grammatically correct, look at the sentence and categorize grammar into the 'acceptable' or 'unacceptable' categories.
Language: japanese, acc: 38.60%, prompt: Please read the given sentence and categorize the grammar into the 'acceptable' or 'unacceptable' categories.
Language: japanese, acc: 38.20%, prompt: To determine the correctness of a sentence, categorize grammar into two categories: 'acceptable' or 'unacceptable'.
Language: japanese, acc: 23.30%, prompt: Classify the grammar of a given sentence into two categories: 'acceptable' and 'unacceptable'.
Language: japanese, acc: 42.90%, prompt: Classify a given sentence into the categories 'acceptable' or 'unacceptable' to determine whether it is grammatically correct.
Language: japanese, acc: 22.00%, prompt: To determine whether it is grammatically correct, categorize a given sentence into two categories: 'acceptable' or 'unacceptable'.
Language: japanese, acc: 38.10%, prompt: To determine the correctness of grammar, categorize a given sentence into two categories: 'acceptable' or 'unacceptable'.
Language: japanese, acc: 34.50%, prompt: Classify the grammar of a given sentence into two categories, 'acceptable' or 'unacceptable', and judge its accuracy.
Language: japanese, acc: 48.50%, prompt: To determine whether it is grammatically correct, divide a given sentence into two categories: 'acceptable' or 'unacceptable'.
Language: japanese, acc: 25.80%, prompt: To evaluate the accuracy of grammar, categorize a given sentence into two categories: 'acceptable' or 'unacceptable'.
Language: korean, acc: 33.10%, prompt: As a tool for judging grammatical correctness, please review the sentences and classify the grammar as 'acceptable' or 'unacceptable'.
Language: korean, acc: 33.00%, prompt: Please read the given sentences, and classify the grammar as 'acceptable' or 'unacceptable'.
Language: korean, acc: 43.00%, prompt: Please classify the sentences as 'acceptable' or 'unacceptable' to judge the grammatical appropriateness.
Language: korean, acc: 18.80%, prompt: Please classify the grammar of the given sentences into 2 categories: 'acceptable' or 'unacceptable'.
Language: korean, acc: 40.20%, prompt: Please classify the given sentences as 'acceptable' or 'unacceptable' to judge the correctness of the grammar.
Language: korean, acc: 34.90%, prompt: Please classify the given sentences as 'acceptable' or 'unacceptable' in order to judge their grammatical appropriateness.
Language: korean, acc: 46.50%, prompt: Please determine the correctness of your grammar by classifying sentences as 'acceptable' or 'unacceptable'.
Language: korean, acc: 48.00%, prompt: Classify the grammar of a given sentence as 'acceptable' or 'unacceptable', and judge its accuracy.
Language: korean, acc: 40.20%, prompt: Please classify the given sentences as 'acceptable' or 'unacceptable' to judge the correctness of the grammar.
Language: korean, acc: 37.20%, prompt: Please rate the accuracy of your grammar by categorizing sentences as 'acceptable' or 'unacceptable'.
# mnli
## 10 prompts
Acc: 22.50%, prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Acc: 21.60%, prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Acc: 19.30%, prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Acc: 16.40%, prompt: Determine if the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Acc: 16.30%, prompt: Review the two statements and categorize their relationship as either 'entailment', 'neutral', or 'contradiction':
Acc: 13.30%, prompt: Does the relationship between the given sentences represent entailment, neutral, or contradiction? Respond with 'entailment', 'neutral', or 'contradiction':
Acc: 11.80%, prompt: Analyze the two provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Acc: 10.50%, prompt: Considering the two sentences, identify if their relationship is 'entailment', 'neutral', or 'contradiction':
Acc: 3.70%, prompt: Examine the pair of sentences and determine if they exhibit entailment, neutral, or contradiction. Answer with either 'entailment', 'neutral', or 'contradiction':
Acc: 3.70%, prompt: Identify whether the given pair of sentences demonstrates entailment, neutral, or contradiction. Answer with 'entailment', 'neutral', or 'contradiction':
Acc: 29.20%, prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Acc: 28.40%, prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Acc: 25.00%, prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Acc: 19.90%, prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment, neutral, or contradiction. Answer with 'entailment', 'neutral', or 'contradiction':
Acc: 19.50%, prompt: Acting as an entailment detection instrument, determine if the given pair of sentences demonstrates entailment, neutral, or contradiction. Answer with 'entailment', 'neutral', or 'contradiction':
Acc: 18.70%, prompt: While performing entailment analysis, classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Acc: 18.30%, prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Acc: 17.40%, prompt: In the capacity of an entailment assessment system, indicate if the link between the following sentences is 'entailment', 'neutral', or 'contradiction':
Acc: 12.50%, prompt: As an instrument for entailment evaluation, consider the two sentences and determine if their relationship is 'entailment', 'neutral', or 'contradiction':
Acc: 11.40%, prompt: As a tool for determining entailment relationships, review the two statements and categorize their connection as either 'entailment', 'neutral', or 'contradiction':
## bertattack
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Assess the connection between the former expressions and classify it as 'entailment', 'neutral', or 'contradiction':"
Original acc: 22.50%, attacked acc: 3.40%, dropped acc: 19.10%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Please classify the relationship between the two pronouns as 'entailment', 'neutral', or 'contradiction':"
Original acc: 21.60%, attacked acc: 3.60%, dropped acc: 18.00%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"question if the subject between the below sentences is 'entailment', 'neutral', or 'contradiction':"
Original acc: 19.30%, attacked acc: 0.30%, dropped acc: 19.00%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Functioning as an entailment evaluation aid, study the described discourse and determine if their relationship is 'entailment', 'neutral', or 'contradiction':"
Original acc: 29.20%, attacked acc: 3.30%, dropped acc: 25.90%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"using an entailment identification grammar, approach the connected between the following sentence and question with 'entailment', 'neutral', or 'contradiction':"
Original acc: 28.40%, attacked acc: 0.10%, dropped acc: 28.30%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"s as an entailment classtree, query whether the given pair of sentences denotes entailment, neutral, or contradiction. answer with 'entailment', 'neutral', or 'contradiction':"
Original acc: 25.00%, attacked acc: 0.00%, dropped acc: 25.00%
## checklist
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction' rzu98kF2KF:"
Original acc: 22.50%, attacked acc: 15.40%, dropped acc: 7.10%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction' vTAjHynoIG:"
Original acc: 21.60%, attacked acc: 12.30%, dropped acc: 9.30%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction' w52rwgo0Av:"
Original acc: 19.30%, attacked acc: 14.80%, dropped acc: 4.50%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction' w52rwgo0Av:"
Original acc: 29.20%, attacked acc: 15.00%, dropped acc: 14.20%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction' 5yWbBXztUY:"
Original acc: 28.40%, attacked acc: 13.40%, dropped acc: 15.00%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction' vTAjHynoIG:"
Original acc: 25.00%, attacked acc: 7.90%, dropped acc: 17.10%
## deepwordbug
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"Assses the connection between the followinig sentences and classfiy it as 'entailment', 'neutral', or 'contradiction':" Original acc: 22.40%, attacked acc: 14.00%, dropped acc: 8.40%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"Plesae classify the relationship between the provide sentencBs as 'entailment', 'neutral', or 'contradiction':" Original acc: 21.70%, attacked acc: 11.60%, dropped acc: 10.10%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"InAdicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':" Original acc: 19.20%, attacked acc: 8.30%, dropped acc: 10.90%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"unctioning as an entailment evaluation tool, analyze the Bprovided sentenves and decBide if their relationship is 'entailment', 'neutral', or 'contradiction':" Original acc: 29.30%, attacked acc: 13.90%, dropped acc: 15.40%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"As an entailment identification system, exaOmine the connection between the following senBences and erspond with 'entailment', 'neutral', or 'contradiction':" Original acc: 28.50%, attacked acc: 17.70%, dropped acc: 10.80%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction': Attacked prompt: b"orking as an entailment classifTer, identify whether the Bgiven pair of sentences Yisplays entailment, neutral, or contradiction. gRespond with 'entailment', 'neutral', or 'contradiction':" Original acc: 25.10%, attacked acc: 2.40%, dropped acc: 22.70%
## stresstest
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 22.50%, attacked acc: 24.80%, dropped acc: -2.30%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction' and false is not true :"
Original acc: 21.60%, attacked acc: 28.30%, dropped acc: -6.70%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction' and false is not true :"
Original acc: 19.30%, attacked acc: 26.40%, dropped acc: -7.10%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction' and false is not true :"
Original acc: 29.20%, attacked acc: 30.80%, dropped acc: -1.60%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 28.40%, attacked acc: 33.10%, dropped acc: -4.70%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 25.00%, attacked acc: 29.50%, dropped acc: -4.50%
## textbugger
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Assess the relationship between the later sen\xf0\x9d\x9a\x9dences and classify it as 'entailment', 'neutral', or 'contradiction':"
Original acc: 22.50%, attacked acc: 1.00%, dropped acc: 21.50%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"P\xe2\x85\xbcease classify the relationship between the provided sentencing as 'entailment', 'neutral', or 'contradiction':"
Original acc: 21.60%, attacked acc: 4.00%, dropped acc: 17.60%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Indica\xf0\x9d\x9a\x9de if the connection between the following sentence is 'entailment', 'neutral', or 'contradiction':"
Original acc: 19.30%, attacked acc: 0.50%, dropped acc: 18.80%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Func\xf0\x9d\x9a\x9dioning as an entailment evaluation \xf0\x9d\x9a\x9dool, analyzing the provided sen\xf0\x9d\x9a\x9dences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':"
Original acc: 29.20%, attacked acc: 1.80%, dropped acc: 27.40%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Because an entailment identification scheme, exa\xef\xbd\x8dine the link between the following sentences and \xe2\xb2\x85espond with 'entailment', 'neutral', or 'contradiction':"
Original acc: 28.40%, attacked acc: 3.70%, dropped acc: 24.70%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Wor\xf0\x9d\x92\x8cing as an entailment classifi\xd0\xb5r, identify whether the given p air of sen\xf0\x9d\x9a\x9dences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':"
Original acc: 25.00%, attacked acc: 2.70%, dropped acc: 22.30%
## textfooler
Original prompt: Assess the connection between the following sentences and classify it as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Assess the correlation between the following verdicts and classify it as 'entailment', 'neutral', or 'contradiction':"
Original acc: 22.50%, attacked acc: 2.90%, dropped acc: 19.60%
Original prompt: Please classify the relationship between the provided sentences as 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Ask classify the relationship between the provided priors as 'entailment', 'neutral', or 'contradiction':"
Original acc: 21.60%, attacked acc: 2.00%, dropped acc: 19.60%
Original prompt: Indicate if the connection between the following sentences is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Outlined if the nexus between the following sentence is 'entailment', 'neutral', or 'contradiction':"
Original acc: 19.30%, attacked acc: 1.00%, dropped acc: 18.30%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Task as an entailment estimation utility, analyze the provided condemning and identify if their relationship is 'entailment', 'neutral', or 'contradiction':"
Original acc: 29.20%, attacked acc: 7.70%, dropped acc: 21.50%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Since an entailment detecting scheme, investigates the relation between the following sentences and answer with 'entailment', 'neutral', or 'contradiction':"
Original acc: 28.40%, attacked acc: 1.20%, dropped acc: 27.20%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction':
Attacked prompt: b"Conjunction as an entailment classifier, locate whether the given doublet of priors denotes entailment, neutral, or contradiction. Answer with 'entailment', 'neutral', or 'contradiction':"
Original acc: 25.00%, attacked acc: 0.50%, dropped acc: 24.50%
# mrpc
## 10 prompts
Acc: 67.16%, prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Acc: 58.09%, prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Acc: 56.86%, prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Acc: 48.28%, prompt: Examine the pair of sentences and decide if their meanings are identical by answering with 'equivalent' or 'not_equivalent'.
Acc: 47.06%, prompt: Determine if the meanings of the following sentences are semantically equivalent by responding with 'equivalent' or 'not_equivalent'.
Acc: 43.14%, prompt: Do the meanings of these two statements align? Indicate your answer with 'equivalent' or 'not_equivalent'.
Acc: 38.24%, prompt: Can the given sentences be considered semantically identical? Please reply with 'equivalent' or 'not_equivalent'.
Acc: 19.36%, prompt: Evaluate whether the two provided sentences convey the same meaning by answering 'equivalent' or 'not_equivalent'.
Acc: 19.12%, prompt: Please analyze the provided sentences and indicate if their meanings are the same by choosing 'equivalent' or 'not_equivalent'.
Acc: 1.96%, prompt: Assess if the two given sentences have equivalent meanings by selecting 'equivalent' or 'not_equivalent'.
Acc: 53.43%, prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Acc: 52.21%, prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Acc: 39.46%, prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Acc: 35.54%, prompt: As a linguistic comparator, review the following pair of sentences and determine their semantic equivalence by choosing 'equivalent' or 'not_equivalent'.
Acc: 34.31%, prompt: As a semantic comparison expert, evaluate the given pair of sentences and determine if they are 'equivalent' or 'not_equivalent'.
Acc: 31.86%, prompt: As a language comparison expert, examine the given pair of sentences and decide if their meanings align, answering with 'equivalent' or 'not_equivalent'.
Acc: 18.63%, prompt: As a textual similarity evaluator, analyze the following pair of sentences and determine if they are semantically 'equivalent' or 'not_equivalent'.
Acc: 16.67%, prompt: In your capacity as a semantic assessment tool, evaluate the provided sentences and classify their meanings as 'equivalent' or 'not_equivalent'.
Acc: 14.71%, prompt: In your capacity as a language analyst, assess the following sentences and classify their similarity as 'equivalent' or 'not_equivalent'.
Acc: 11.52%, prompt: As a sentence similarity evaluator, analyze the provided sentences and indicate if their meanings are 'equivalent' or 'not_equivalent'.
## bertattack
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Are the meanings of the next in of sentences the same? q with 'equivalent' or 'not_equivalent'. "
Original acc: 67.16%, attacked acc: 0.00%, dropped acc: 67.16%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Compare the equivalent sentences and confirm if they share the same social meaning by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 58.09%, attacked acc: 0.00%, dropped acc: 58.09%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Do these two sentences have the same underlying meaning? object with 'equivalent' or 'not_equivalent'. "
Original acc: 56.86%, attacked acc: 0.00%, dropped acc: 56.86%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"at the instance of a sentence comparative analyst, score the supplied sentences and indicate if they signal the same meaning by being 'equivalent' or 'not_equivalent'. "
Original acc: 53.43%, attacked acc: 1.47%, dropped acc: 51.96%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"[UNK] your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are different, responding with 'equivalent' or 'not_equivalent'. "
Original acc: 52.21%, attacked acc: 0.00%, dropped acc: 52.21%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"into the career of a textual comparison specialist, examine the given sentences and rank if they share the same meaning, matching with 'equivalent' or 'not_equivalent'. "
Original acc: 39.46%, attacked acc: 0.00%, dropped acc: 39.46%
## checklist
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent' vTAjHynoIG. "
Original acc: 67.16%, attacked acc: 29.17%, dropped acc: 37.99%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent' AMsRIKZniY. "
Original acc: 58.09%, attacked acc: 12.50%, dropped acc: 45.59%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent' vTAjHynoIG. "
Original acc: 56.86%, attacked acc: 9.31%, dropped acc: 47.55%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent' Cev0s49fbe. "
Original acc: 53.43%, attacked acc: 0.25%, dropped acc: 53.19%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent' AMsRIKZniY. "
Original acc: 52.21%, attacked acc: 6.62%, dropped acc: 45.59%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent' MQnugHcaoy. "
Original acc: 39.46%, attacked acc: 5.64%, dropped acc: 33.82%
## deepwordbug
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"tre the meXanings of the Uollowing paOir of sentences the same? AnsQer with 'equivalent' or 'not_equivalent'. "
Original acc: 67.16%, attacked acc: 6.37%, dropped acc: 60.78%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"oCmpare the follAwing sentewces and determine if they share the same swmantic meaning by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 57.84%, attacked acc: 0.00%, dropped acc: 57.84%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"DAo these otwo sentecnes have the same underlyCng mLeaning? Respod with 'equivalent' or 'not_equivalent'. "
Original acc: 57.35%, attacked acc: 2.21%, dropped acc: 55.15%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In the roOe of a sentence comparison anaeyst, assss the providde sentences and indicate if they cSonvey the same meaning by sDelecting 'equivalent' or 'not_equivalent'. "
Original acc: 52.94%, attacked acc: 4.17%, dropped acc: 48.77%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"IYn your capaciMy as a semantic comparison ool, examine the given sentences and deXide if their meanings are identRcal, rseponding with 'equivalent' or 'not_equivalent'. "
Original acc: 52.21%, attacked acc: 1.47%, dropped acc: 50.74%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Io the role of a textual comparison specialist, examine the givIen sentences and decide if they share the same meannig, renponding with 'equivalent' or 'not_equivalent'. "
Original acc: 39.46%, attacked acc: 5.15%, dropped acc: 34.31%
## stresstest
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 67.16%, attacked acc: 67.89%, dropped acc: -0.74%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 58.09%, attacked acc: 56.37%, dropped acc: 1.72%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 56.86%, attacked acc: 58.09%, dropped acc: -1.23%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 53.43%, attacked acc: 47.79%, dropped acc: 5.64%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 52.21%, attacked acc: 33.33%, dropped acc: 18.87%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 39.46%, attacked acc: 38.24%, dropped acc: 1.23%
## textbugger
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Be the meanings of the \xf0\x9d\x9a\x8following pair of sentences the same? Responding with 'equivalent' or 'not_equivalent'. "
Original acc: 67.16%, attacked acc: 1.96%, dropped acc: 65.20%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Com\xd1\x80are the later sentences and determine if they share the same semantic meaning by respo\xd5\xb8ding with 'equivalent' or 'not_equivalent'. "
Original acc: 58.09%, attacked acc: 0.49%, dropped acc: 57.60%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Could these two sentences have the same underlying connotation? Cater with 'equivalent' or 'not_equivalent'. "
Original acc: 56.86%, attacked acc: 1.47%, dropped acc: 55.39%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"I\xd5\xb8 the rloe of a sentence comparison analyst, asess the provided sentences and indicate if they convey the same meaning by selec\xf0\x9d\x9a\x9ding 'equivalent' or 'not_equivalent'. "
Original acc: 53.43%, attacked acc: 0.00%, dropped acc: 53.43%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"At your capacity as a seman\xf0\x9d\x9a\x9dic com\xd1\x80arison tool, examine the given sentences and decide if their meanings are identical, respo\xd5\xb8ding with 'equivalent' or 'not_equivalent'. "
Original acc: 52.21%, attacked acc: 0.00%, dropped acc: 52.21%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"I\xd5\xb8 the role of a textual comparison specialist, examine the given sen\xf0\x9d\x9a\x9dences and decide if they share the same meaning, respondi\xd5\xb8g with 'equivalent' or 'not_equivalent'. "
Original acc: 39.46%, attacked acc: 0.00%, dropped acc: 39.46%
## textfooler
Original prompt: Are the meanings of the following pair of sentences the same? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Are the meanings of the following couple of sentences the same? Cope with 'equivalent' or 'not_equivalent'. "
Original acc: 67.16%, attacked acc: 0.49%, dropped acc: 66.67%
Original prompt: Compare the following sentences and determine if they share the same semantic meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Compare the suite sentences and identify if they share the same semantic meanings by satisfy with 'equivalent' or 'not_equivalent'. "
Original acc: 58.09%, attacked acc: 1.23%, dropped acc: 56.86%
Original prompt: Do these two sentences have the same underlying meaning? Respond with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Do these two sentences have the same nucleus connotation? Reacts with 'equivalent' or 'not_equivalent'. "
Original acc: 56.86%, attacked acc: 1.23%, dropped acc: 55.64%
Original prompt: In the role of a sentence comparison analyst, assess the provided sentences and indicate if they convey the same meaning by selecting 'equivalent' or 'not_equivalent'.
Attacked prompt: b"About the role of a sentence comparison commentator, assess the presented judgements and indicate if they convey the same meaning by option 'equivalent' or 'not_equivalent'. "
Original acc: 53.43%, attacked acc: 2.21%, dropped acc: 51.23%
Original prompt: In your capacity as a semantic comparison tool, examine the given sentences and decide if their meanings are identical, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Per your powers as a semantic comparison paraphernalia, consider the given sentence and decide if their senses are identical, cope with 'equivalent' or 'not_equivalent'. "
Original acc: 52.21%, attacked acc: 6.13%, dropped acc: 46.08%
Original prompt: In the role of a textual comparison specialist, examine the given sentences and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"About the role of a textual comparison specialist, examine the given judgements and decide if they exchanging the same meaning, responses with 'equivalent' or 'not_equivalent'. "
Original acc: 39.46%, attacked acc: 0.00%, dropped acc: 39.46%
## translation
Language: chinese, acc: 17.16%, prompt: As an expert in semantic comparison, evaluates two given sentences to determine whether they are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 16.18%, prompt: From the point of view of semantic comparison, determine whether this pair of sentences is 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 8.58%, prompt: Using the semantic comparison method, determine whether the following two statements are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 1.96%, prompt: For the following two sentences, determine whether they are 'equivalent' or 'not_equivalent' based on semantic comparison.
Language: chinese, acc: 2.45%, prompt: As an expert in semantic comparison, please evaluate the following two sentences and determine if they are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 1.96%, prompt: Using semantic comparison techniques, determine whether the following two sentences are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 0.74%, prompt: Please determine whether the following two sentences are 'equivalent' or 'not_equivalent' according to the standard of semantic comparison.
Language: chinese, acc: 2.45%, prompt: As an expert in the field of semantic comparison, please evaluate the following two sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 0.74%, prompt: Using semantic comparison, determine whether the following two sentences are 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 1.47%, prompt: Determine whether the following two sentences are 'equivalent' or 'not_equivalent' based on semantic comparison.
Language: french, acc: 18.87%, prompt: As an expert in semantic comparison, evaluate the following pair of sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: french, acc: 28.19%, prompt: Can you determine whether the following two sentences are 'equivalent' or 'not_equivalent' as a semantic comparison expert?
Language: french, acc: 6.13%, prompt: Using your expertise in semantic comparison, determine whether the following two sentences are 'equivalent' or 'not_equivalent'.
Language: french, acc: 15.20%, prompt: As a semantic comparison specialist, assess the similarity between the following two sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: french, acc: 4.90%, prompt: Are you able to determine whether the following two sentences are 'equivalent' or 'not_equivalent' as an expert in semantic comparison?
Language: french, acc: 20.34%, prompt: As a semantic comparison professional, evaluate the following pair of sentences and indicate whether they are 'equivalent' or 'not_equivalent'.
Language: french, acc: 15.93%, prompt: Can you determine whether the following two sentences have a 'equivalent' or 'not_equivalent' meaning as an expert in semantic comparison?
Language: french, acc: 29.90%, prompt: As an expert in semantic comparison, assess the similarity between the following two sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: french, acc: 17.89%, prompt: Using your expertise in semantic comparison, determine whether the following two sentences are 'equivalent' or 'not_equivalent' in terms of meaning.
Language: french, acc: 7.60%, prompt: As a semantic comparison professional, assess the similarity between the following two sentences and indicate whether they are 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 15.69%, prompt: As an expert in semantic comparison, evaluate the two given sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 8.33%, prompt: Based on my experience in semantic analysis, classify the following two sentences as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 8.58%, prompt: As an expert in semantic comparison, analyze the following two sentences and classify them as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 29.90%, prompt: Your task as an expert in semantic comparison is to evaluate the following two sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 13.73%, prompt: As a semantic comparison specialist, analyze the two data statements and insert them into one of the following categories: 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 24.75%, prompt: Based on my experience in semantic analysis, classify the following two sentences between 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 12.01%, prompt: Your role as a semantic comparison specialist requires analyzing the two given sentences and determining whether they are 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 6.13%, prompt: As an experienced semantic analyst, classify the following two sentences as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 6.37%, prompt: Your job as a semantic analyst evaluates the following two sentences as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 14.22%, prompt: As a semantic analyst, determine whether the given sentences are 'equivalent' or 'not_equivalent' based on their relationship.
Language: spanish, acc: 15.44%, prompt: As an expert in semantic comparison, it evaluates the pair of sentences provided and determines whether they are 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 8.33%, prompt: Based on my experience in semantic analysis, classify the following two sentences as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 23.28%, prompt: As an expert in semantic comparison, analyze the two sentences given and classify them as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 19.61%, prompt: Your task as a semantic comparison specialist is to evaluate the following two sentences and determine whether they are 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 26.96%, prompt: As an expert in semantic analysis, he makes a classification of the following two sentences based on their 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 9.80%, prompt: Based on your experience of semantic comparison, classify the next two sentences as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 9.07%, prompt: As a specialist in semantic analysis, you are given the task of analysing the two sentences given and classifying them as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 2.21%, prompt: As an expert in semantic comparison, he classifies the following two sentences into 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 8.58%, prompt: As a specialist in semantic analysis, evaluate the following two sentences and classify them as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 38.48%, prompt: Your task as an expert in semantic comparison is to analyze the two sentences provided and determine whether they are 'equivalent' or 'not_equivalent' based on their semantic relationship.
Language: japanese, acc: 16.18%, prompt: Evaluate whether a given pair of sentences is 'equivalent' or 'not_equivalent', depending on the context.
Language: japanese, acc: 16.67%, prompt: Use a semantic comparison to determine whether a given pair of sentences is 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 4.17%, prompt: Evaluate a given pair of sentences as 'equivalent' or 'not_equivalent' by determining whether they have the same semantic meaning.
Language: japanese, acc: 59.80%, prompt: Determine whether a given pair of sentences is synonyms and evaluate whether they are 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 30.15%, prompt: Determine whether a given pair of sentences is 'equivalent' or 'not_equivalent', and whether they are semantically identical.
Language: japanese, acc: 54.17%, prompt: Determinate whether a given pair of sentences has the same meaning and evaluate whether they are 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 9.80%, prompt: Evaluate whether a given pair of sentences is 'equivalent' or 'not_equivalent' by determining whether they are semantically identical.
Language: japanese, acc: 39.95%, prompt: Judge whether a given pair of sentences is equal and evaluate whether they are 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 51.23%, prompt: Determinate whether a given pair of sentences are semantically equal and evaluate whether they are 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 10.05%, prompt: Whether a given pair of sentences is 'equivalent' or 'not_equivalent' depends on the context.
Language: korean, acc: 25.00%, prompt: As a sentence comparator, evaluate the two sentences given to determine 'equivalent' or 'not_equivalent'.
Language: korean, acc: 9.56%, prompt: Compare two sentences to determine 'equivalent' or 'not_equivalent'. For this you need qualifications as a specialist in semantic comparison.
Language: korean, acc: 4.41%, prompt: It takes your knowledge as an expert in semantic comparison to determine that two sentences are 'equivalent' or 'not_equivalent'.
Language: korean, acc: 29.17%, prompt: As a specialist in semantic comparison, evaluate whether two given sentences are 'equivalent' or 'not_equivalent'.
Language: korean, acc: 38.48%, prompt: Analyze two sentences to determine 'equivalent' or 'not_equivalent'. For that you need the knowledge of a semantic comparison expert.
Language: korean, acc: 18.63%, prompt: As an expert in semantic comparison, decide whether two sentences are 'equivalent' or 'not_equivalent'.
Language: korean, acc: 9.07%, prompt: It takes the knowledge of an expert in semantic comparison to compare two sentences to judge 'equivalent' or 'not_equivalent'.
Language: korean, acc: 18.87%, prompt: Experience as an expert in semantic comparison is required to determine whether two given sentences are 'equivalent' or 'not_equivalent'.
Language: korean, acc: 23.28%, prompt: As an expert in semantic comparison, determine whether two sentences are 'equivalent' or 'not_equivalent'.
Language: korean, acc: 19.85%, prompt: Analyze two sentences to determine 'equivalent' or 'not_equivalent'. For this, you need a qualification as a specialist in semantic comparison.
# qnli
## 10 prompts
Acc: 37.90%, prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Acc: 30.90%, prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Acc: 30.50%, prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Acc: 30.10%, prompt: Consider the context and question, and indicate if the answer can be logically deduced from the context by responding with 'entailment' or 'not_entailment'.
Acc: 29.30%, prompt: Based on the information in the context, decide if the answer to the question is justified by choosing 'entailment' or 'not_entailment'.
Acc: 26.50%, prompt: Please assess if the answer to the question can be derived from the given context by selecting 'entailment' or 'not_entailment'.
Acc: 10.50%, prompt: Given the question and context provided, determine if the answer can be inferred by choosing 'entailment' or 'not_entailment'.
Acc: 10.20%, prompt: Assess if the answer to the question can be logically concluded from the provided context by choosing 'entailment' or 'not_entailment'.
Acc: 10.00%, prompt: Examine the context and question, and determine if the context logically implies the answer by selecting 'entailment' or 'not_entailment'.
Acc: 7.60%, prompt: Analyze the context and question, and indicate if the context entails the answer by choosing 'entailment' or 'not_entailment'.
Acc: 38.40%, prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Acc: 37.30%, prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Acc: 35.30%, prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Acc: 33.90%, prompt: As a linguistic consultant, decide if the answer to the question is logically supported by the provided context and respond with 'entailment' or 'not_entailment'.
Acc: 27.90%, prompt: In your role as a semantic evaluator, determine if the provided context justifies the answer to the question and answer with 'entailment' or 'not_entailment'.
Acc: 26.40%, prompt: As a textual analyst, examine if the given context logically implies the answer to the question and indicate your decision with 'entailment' or 'not_entailment'.
Acc: 23.40%, prompt: In the capacity of a language specialist, decide if the context presented contains enough information to infer the answer to the question and respond with 'entailment' or 'not_entailment'.
Acc: 20.10%, prompt: As a semantic interpreter, assess whether the provided context supports the answer to the given question and answer with 'entailment' or 'not_entailment'.
Acc: 16.60%, prompt: As a semantic researcher, evaluate whether the provided context supports the answer to the question and choose 'entailment' or 'not_entailment'.
Acc: 7.30%, prompt: In the capacity of a language evaluator, examine if the given context justifies the answer to the question and indicate your assessment with 'entailment' or 'not_entailment'.
## bertattack
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"and whether the following context supports the answer to the question by responses with 'entailment' or 'not_entailment'. "
Original acc: 37.90%, attacked acc: 1.60%, dropped acc: 36.30%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"1 on the provided subject and question, decided if the information supports the response by responses with 'entailment' or 'not_entailment'. "
Original acc: 30.90%, attacked acc: 4.40%, dropped acc: 26.50%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Review the given context and question, and decide if the context requires enough information to let the answer by choices 'entailment' or 'not_entailment'. "
Original acc: 30.50%, attacked acc: 0.00%, dropped acc: 30.50%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"example a textual in expert, analyzing if the answer to the question can be deduced from the language context and think 'entailment' or 'not_entailment'. "
Original acc: 38.40%, attacked acc: 0.40%, dropped acc: 38.00%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"In your role as a linguistic investigator, tell if the conversation given outis the answer to the question and provide your consensus with 'entailment' or 'not_entailment'. "
Original acc: 37.30%, attacked acc: 0.60%, dropped acc: 36.70%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"As a language expert, assessed if the context citation enproduces the answers to the question and approach with 'entailment' or 'not_entailment'. "
Original acc: 35.30%, attacked acc: 0.40%, dropped acc: 34.90%
## checklist
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment' rzu98kF2KF. "
Original acc: 37.90%, attacked acc: 3.40%, dropped acc: 34.50%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment' 9vGXgnbWB8. "
Original acc: 30.90%, attacked acc: 3.20%, dropped acc: 27.70%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment' 9vGXgnbWB8. "
Original acc: 30.50%, attacked acc: 2.90%, dropped acc: 27.60%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment' 9vGXgnbWB8. "
Original acc: 38.40%, attacked acc: 10.80%, dropped acc: 27.60%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment' SvAp8RlOFn. "
Original acc: 37.30%, attacked acc: 16.10%, dropped acc: 21.20%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment' RcRneWo6Iv. "
Original acc: 35.30%, attacked acc: 5.20%, dropped acc: 30.10%
## deepwordbug
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"EvaluaDe whether the igven context suppors the ansewr to the quHstion by respoding with 'entailment' or 'not_entailment'. "
Original acc: 38.10%, attacked acc: 6.90%, dropped acc: 31.20%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Based on the povided ocntext and quetion, decide if the inofrmation supports the answer by responding with 'entailment' or 'not_entailment'. "
Original acc: 30.80%, attacked acc: 12.90%, dropped acc: 17.90%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Rview the gven cnotext and question, and edcide if the context containZs enough information to Vsupport the ainswer by selecting 'entailment' or 'not_entailment'. "
Original acc: 30.50%, attacked acc: 4.70%, dropped acc: 25.80%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"fs a textual inference xpert, anaRyze if the answer to the question can be deduced from the provided cotnext and yselect 'entailment' or 'not_entailment'. "
Original acc: 38.50%, attacked acc: 6.20%, dropped acc: 32.30%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"In your role as a lignuistic investigator, determine if the cnotext given entavils the anwser to the question and proide your conclusion with 'entailment' or 'not_entailment'. "
Original acc: 37.30%, attacked acc: 5.20%, dropped acc: 32.10%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"As a language epxert, arssess if the gien contet entaxls the ansGer to the questAon and resNpond with 'entailment' or 'not_entailment'. "
Original acc: 35.30%, attacked acc: 2.10%, dropped acc: 33.20%
## stresstest
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 37.90%, attacked acc: 19.80%, dropped acc: 18.10%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 30.90%, attacked acc: 16.20%, dropped acc: 14.70%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 30.50%, attacked acc: 33.70%, dropped acc: -3.20%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 38.40%, attacked acc: 29.50%, dropped acc: 8.90%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment' and true is true . "
Original acc: 37.30%, attacked acc: 28.00%, dropped acc: 9.30%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 35.30%, attacked acc: 19.80%, dropped acc: 15.50%
## textbugger
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"E\xd1\xb5aluate whether the give\xd5\xb8 con\xf0\x9d\x9a\x9dext supports the answer to the question by responding with 'entailment' or 'not_entailment'. "
Original acc: 37.90%, attacked acc: 0.50%, dropped acc: 37.40%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Ba\xd1\x95ed on the provided context and question, deciding if the particulars suppor\xf0\x9d\x9a\x9ds the answer by responds with 'entailment' or 'not_entailment'. "
Original acc: 30.90%, attacked acc: 0.40%, dropped acc: 30.50%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Revisited the afforded cont\xd0\xb5xt and question, and decide if the context contains enough information to support the answer by selec\xf0\x9d\x9a\x9ding 'entailment' or 'not_entailment'. "
Original acc: 30.50%, attacked acc: 0.00%, dropped acc: 30.50%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"A\xd1\x95 a textual deduced expe\xe2\xb2\x85t, analyzes if the replying to the question can be deduced from the provided context and selec\xf0\x9d\x9a\x9d 'entailment' or 'not_entailment'. "
Original acc: 38.40%, attacked acc: 0.10%, dropped acc: 38.30%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"In your \xe2\xb2\x85ole as a linguistic researcher, determine if the con\xf0\x9d\x9a\x9dext yielded entai\xe2\x85\xbcs the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'. "
Original acc: 37.30%, attacked acc: 0.30%, dropped acc: 37.00%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"A\xd1\x95 a linguistic ex\xd1\x80ert, assessing if the given con\xf0\x9d\x9a\x9dext implies the answer to the question and responding with 'entailment' or 'not_entailment'. "
Original acc: 35.30%, attacked acc: 0.10%, dropped acc: 35.20%
## textfooler
Original prompt: Evaluate whether the given context supports the answer to the question by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Weighed whether the given context assistance the reaction to the question by answered with 'entailment' or 'not_entailment'. "
Original acc: 37.90%, attacked acc: 1.00%, dropped acc: 36.90%
Original prompt: Based on the provided context and question, decide if the information supports the answer by responding with 'entailment' or 'not_entailment'.
Attacked prompt: b"Based on the presented framework and question, decide if the info supports the reacting by meet with 'entailment' or 'not_entailment'. "
Original acc: 30.90%, attacked acc: 1.60%, dropped acc: 29.30%
Original prompt: Review the given context and question, and decide if the context contains enough information to support the answer by selecting 'entailment' or 'not_entailment'.
Attacked prompt: b"Reconsidered the given context and question, and ascertain if the context comprises enough information to favour the answer by choices 'entailment' or 'not_entailment'. "
Original acc: 30.50%, attacked acc: 0.00%, dropped acc: 30.50%
Original prompt: As a textual inference expert, analyze if the answer to the question can be deduced from the provided context and select 'entailment' or 'not_entailment'.
Attacked prompt: b"Because a textual inference practitioner, analysed if the reaction to the subject can be deduced from the awarded context and click 'entailment' or 'not_entailment'. "
Original acc: 38.40%, attacked acc: 1.20%, dropped acc: 37.20%
Original prompt: In your role as a linguistic investigator, determine if the context given entails the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'.
Attacked prompt: b"Among your features as a linguistic fellows, define if the ambient yielded translates the answer to the question and provide your conclusion with 'entailment' or 'not_entailment'. "
Original acc: 37.30%, attacked acc: 0.90%, dropped acc: 36.40%
Original prompt: As a language expert, assess if the given context entails the answer to the question and respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"As a language expert, weighed if the delivered framework translates the answer to the question and respondent with 'entailment' or 'not_entailment'. "
Original acc: 35.30%, attacked acc: 0.60%, dropped acc: 34.70%
## translation
Language: chinese, acc: 29.30%, prompt: Determine whether the answer is implied or not based on the context. Answer 'entailment' or 'not_entailment'.
Language: chinese, acc: 12.80%, prompt: As a language expert, determine whether the text contains the answer and choose 'entailment' or 'not_entailment'.
Language: chinese, acc: 19.30%, prompt: Determine whether the text implies an answer, and select 'entailment' or 'not_entailment'.
Language: chinese, acc: 35.90%, prompt: Given the text and the question, whether the answer is implied, select 'entailment' or 'not_entailment'.
Language: chinese, acc: 16.60%, prompt: Determine whether the text contains the answer, depending on the context. Select 'entailment' or 'not_entailment'.
Language: chinese, acc: 25.80%, prompt: As a language expert, determine whether a text can contain an answer, and say 'entailment' or 'not_entailment'.
Language: chinese, acc: 40.30%, prompt: Please determine whether the text implies an answer. Answer 'entailment' or 'not_entailment'.
Language: chinese, acc: 20.00%, prompt: Please select 'entailment' or 'not_entailment' based on the text and the question.
Language: chinese, acc: 33.90%, prompt: Assess whether the answer is implied based on the context. Answer 'entailment' or 'not_entailment'.
Language: chinese, acc: 40.90%, prompt: Please determine whether the text contains the answer and answer 'entailment' or 'not_entailment'.
Language: french, acc: 43.90%, prompt: As a linguistic expert, assess whether the given context involves the answer to the question and answer with 'entailment' or 'not_entailment'.
Language: french, acc: 2.40%, prompt: Determine whether the information provided in the context necessarily leads to the answer to the question asked and indicate 'entailment' or 'not_entailment'.
Language: french, acc: 11.50%, prompt: Analyze the text to determine if the answer to the question is implied in the context and specify 'entailment' or 'not_entailment'.
Language: french, acc: 40.00%, prompt: Based on the given context, decide whether the answer to the question is necessarily involved and mark 'entailment' or 'not_entailment'.
Language: french, acc: 14.40%, prompt: Evaluate whether the answer to the question can be deduced from the given context and mark 'entailment' or 'not_entailment'.
Language: french, acc: 23.20%, prompt: Discern whether the context provided directly involves the answer to the question and indicate 'entailment' or 'not_entailment'.
Language: french, acc: 26.60%, prompt: Determine if the context contains enough information to involve the answer to the question and mark 'entailment' or 'not_entailment'.
Language: french, acc: 11.20%, prompt: Assess whether the context provided necessarily leads to the answer to the question and answer with 'entailment' or 'not_entailment'.
Language: french, acc: 17.10%, prompt: Analyze the text to determine if the answer to the question is involved in the context and indicate 'entailment' or 'not_entailment'.
Language: french, acc: 30.90%, prompt: Based on the given context, decide whether the answer to the question is necessarily inferred and mark 'entailment' or 'not_entailment'.
Language: arabic, acc: 32.90%, prompt: As a language expert, evaluate whether the given context calls for an answer and answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 8.70%, prompt: Judge the relationship between the text and the question and answer 'entailment' or 'not_entailment', depending on your language experience.
Language: arabic, acc: 12.30%, prompt: Does the context given indicate the answer to the question? Evaluate and answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 44.40%, prompt: Based on your linguistic knowledge, does the text relate to the question? Answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 5.10%, prompt: As a language expert, determine how the text relates to the question and answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 27.40%, prompt: Does the text support the answer to the question? Answer 'entailment' or 'not_entailment', depending on your language experience.
Language: arabic, acc: 6.90%, prompt: Check the text link to the question and answer 'entailment' or 'not_entailment', depending on your language skills.
Language: arabic, acc: 34.30%, prompt: As a language expert, is there a link between the text and the question? Answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 7.50%, prompt: Based on your language experience, does context help to answer the question? Evaluate and answer 'entailment' or 'not_entailment'.
Language: arabic, acc: 20.80%, prompt: Does the text give a clear answer to the question? Answer 'entailment' or 'not_entailment', depending on your language experience.
Language: spanish, acc: 39.40%, prompt: As a language expert, evaluate whether the given context implies the answer to the question and answer with 'entailment' or 'not_entailment'.
Language: spanish, acc: 27.40%, prompt: Determine whether the information given in the text necessarily implies the veracity of the hypothesis and answer 'entailment' or 'not_entailment'.
Language: spanish, acc: 4.40%, prompt: Analyzes whether the information presented in the paragraph leads to the conclusion of the question and labels the answer as 'entailment' or 'not_entailment'.
Language: spanish, acc: 5.20%, prompt: Indicates whether the information provided in the text is sufficient to conclude the statement and labels the response as 'entailment' or 'not_entailment'.
Language: spanish, acc: 24.80%, prompt: As an expert on the subject, judge whether the information provided in the text justifies the claim and classify the answer as 'entailment' or 'not_entailment'.
Language: spanish, acc: 10.90%, prompt: Evaluates whether the information in the paragraph necessarily supports the conclusion of the hypothesis and responds 'entailment' or 'not_entailment'.
Language: spanish, acc: 4.90%, prompt: Determines whether the information presented in the text logically implies the answer to the question and labels the answer as 'entailment' or 'not_entailment'.
Language: spanish, acc: 5.10%, prompt: Analyzes whether the information provided in the paragraph necessarily leads to the veracity of the hypothesis and classifies the response as 'entailment' or 'not_entailment'.
Language: spanish, acc: 27.00%, prompt: As an expert on the subject, evaluate whether the information presented in the text supports the claim and respond 'entailment' or 'not_entailment'.
Language: spanish, acc: 5.30%, prompt: Indicates whether the information provided in the paragraph necessarily implies the answer to the question and labels the answer as 'entailment' or 'not_entailment'.
Language: japanese, acc: 42.60%, prompt: Rate whether the answer to the question is derived from the given context and answer with 'entailment' or 'not_entailment'.
Language: japanese, acc: 18.10%, prompt: Please answer 'entailment' or 'not_entailment' for the given context and question.
Language: japanese, acc: 42.20%, prompt: Decide whether the answer to the question is derived from the given context and answer 'entailment' or 'not_entailment'.
Language: japanese, acc: 18.90%, prompt: Compare the question with the given context and give the answer 'entailment' or 'not_entailment'.
Language: japanese, acc: 28.90%, prompt: Determinate whether the given context contains the answer to the question and answer with 'entailment' or 'not_entailment'.
Language: japanese, acc: 29.80%, prompt: Estimate the answer of the question from the context and give the answer 'entailment' or 'not_entailment'.
Language: japanese, acc: 22.90%, prompt: Determinate whether the given context is relevant to the question and answer with 'entailment' or 'not_entailment'.
Language: japanese, acc: 27.40%, prompt: Determine whether the given context is relevant to the question and answer with 'entailment' or 'not_entailment'.
Language: japanese, acc: 27.90%, prompt: Determinate whether the given context contains the answer to the question and answer 'entailment' or 'not_entailment'.
Language: japanese, acc: 18.00%, prompt: Answer with 'entailment' or 'not_entailment', inferring from the given context.
Language: korean, acc: 39.60%, prompt: Determine if a given sentence necessarily implies the meaning of another sentence and answer 'entailment' or 'not_entailment'.
Language: korean, acc: 35.10%, prompt: By understanding the relations between sentences, judge whether a given sentence necessarily refers to another sentence and answer with 'entailment' or 'not_entailment'.
Language: korean, acc: 34.30%, prompt: Evaluate whether a given text necessarily indicates the meaning of another text and respond with 'entailment' or 'not_entailment'.
Language: korean, acc: 40.00%, prompt: Understand the relations of a sentence, to determine whether a given sentence necessarily includes other sentences and answer with 'entailment' or 'not_entailment'.
Language: korean, acc: 42.30%, prompt: Judge whether a given content necessarily implies the meaning of another content and answer with 'entailment' or 'not_entailment'.
Language: korean, acc: 31.70%, prompt: Grasp the relations between sentences, determine if a given sentence necessarily contains the meaning of another sentence and respond with 'entailment' or 'not_entailment'.
Language: korean, acc: 42.50%, prompt: Evaluate whether a given text necessarily refers to another text and answer with 'entailment' or 'not_entailment'.
Language: korean, acc: 29.30%, prompt: By comparing the meaning of the sentences, to determine if a given sentence necessarily implies another sentence and answer 'entailment' or 'not_entailment'.
Language: korean, acc: 44.50%, prompt: Evaluate whether the contents given necessarily refer to other contents and answer with 'entailment' or 'not_entailment'.
Language: korean, acc: 35.50%, prompt: By analyzing the relations between sentences, determine if a given sentence does not necessarily include other sentences and answer with 'entailment' or 'not_entailment'.
# qqp
## 10 prompts
Acc: 30.00%, prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Acc: 24.10%, prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Acc: 23.90%, prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Acc: 22.60%, prompt: Assess whether the following statements are identical in meaning by answering 'equivalent' or 'not_equivalent'.
Acc: 22.40%, prompt: Determine if the given pair of statements can be considered the same by responding with 'equivalent' or 'not_equivalent'.
Acc: 20.60%, prompt: Do the following expressions mean the same thing? Provide your answer as 'equivalent' or 'not_equivalent'.
Acc: 20.40%, prompt: Examine the following expressions and tell me if they are alike in meaning by using 'equivalent' or 'not_equivalent'.
Acc: 20.10%, prompt: Do these two sentences convey the same meaning? Indicate with 'equivalent' or 'not_equivalent'.
Acc: 18.30%, prompt: Are the following two questions equivalent or not? Answer me with "equivalent" or "not_equivalent".
Acc: 15.40%, prompt: Evaluate whether these two phrases have identical meanings and respond with 'equivalent' or 'not_equivalent'.
Acc: 6.00%, prompt: Are the meanings of these two phrases the same? Reply with 'equivalent' or 'not_equivalent'.
Acc: 21.10%, prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Acc: 16.90%, prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Acc: 12.50%, prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Acc: 11.60%, prompt: Functioning as a question similarity evaluation tool, analyze the given questions and decide if they share the same meaning, responding with 'equivalent' or 'not_equivalent'.
Acc: 11.30%, prompt: Working as a question classifier, identify whether the given questions share the same meaning, and respond with 'equivalent' for equivalent questions or 'not_equivalent' for different ones.
Acc: 10.80%, prompt: In the capacity of a question assessment system, indicate if the meaning of the provided questions is the same, responding with 'equivalent' or 'not_equivalent'.
Acc: 9.70%, prompt: As a tool for determining question equivalence, review the questions and categorize their similarity as either 'equivalent' or 'not_equivalent'.
Acc: 9.10%, prompt: As a question equivalence detection system, examine the provided questions and respond with 'equivalent' if they are the same in meaning, or 'not_equivalent' if they are different.
Acc: 7.60%, prompt: While performing question comparison analysis, classify the similarity of the following questions as 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Acc: 5.50%, prompt: In your role as a question comparison tool, assess the following pair of questions and classify them as 'equivalent' or 'not_equivalent'.
## bertattack
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"please indicate if the cross over of items share the same meaning by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 30.00%, attacked acc: 1.50%, dropped acc: 28.50%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"is if the given set of and have the same connotation by asking with 'equivalent' or 'not_equivalent'. "
Original acc: 24.10%, attacked acc: 2.00%, dropped acc: 22.10%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"can these two symbols be considered equal in meaning? object with 'equivalent' or 'not_equivalent'. "
Original acc: 23.90%, attacked acc: 0.80%, dropped acc: 23.10%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions. Attacked prompt: b"the an is for of comparison analyze, assess the questions and determine if their content is the same, responding with 'equivalent' for corresponding words or 'not_equivalent' for different questions. " Original acc: 20.90%, attacked acc: 1.20%, dropped acc: 19.70%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions. Attacked prompt: b"into the role of a subject similarity analyzer, review the meaning of the drawn lines and consider if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different topics. " Original acc: 16.80%, attacked acc: 1.00%, dropped acc: 15.80%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones. Attacked prompt: b"Acting as a general equivalence instrument, consider if the provided topics are equivalent in meaning, answering with 'equivalent' for familiar stimuli or 'not_equivalent' for dissimilar ones. " Original acc: 12.30%, attacked acc: 0.40%, dropped acc: 11.90%
## checklist
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent' 6a4Yn3RGVc. "
Original acc: 30.00%, attacked acc: 9.20%, dropped acc: 20.80%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent' 6a4Yn3RGVc. "
Original acc: 24.20%, attacked acc: 4.90%, dropped acc: 19.30%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent' 6a4Yn3RGVc. "
Original acc: 23.70%, attacked acc: 2.10%, dropped acc: 21.60%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Attacked prompt: b"As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions 5XeflW1ZJc. "
Original acc: 20.90%, attacked acc: 1.20%, dropped acc: 19.70%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Attacked prompt: b"In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions w52rwgo0Av. "
Original acc: 17.00%, attacked acc: 10.10%, dropped acc: 6.90%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Attacked prompt: b"Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones 9vGXgnbWB8. "
Original acc: 12.30%, attacked acc: 5.30%, dropped acc: 7.00%
## deepwordbug
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Please indicatJ if the folllwing paWr of setnences share the same maning by respondiWng with 'equivalent' or 'not_equivalent'. "
Original acc: 30.00%, attacked acc: 2.20%, dropped acc: 27.80%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"tAnalyze if the giCen est of sentUnces have the same connotatioq by answering with 'equivalent' or 'not_equivalent'. "
Original acc: 24.10%, attacked acc: 2.00%, dropped acc: 22.10%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Cna these wo Ktatements be considered equad in meaning? UAnswer with 'equivalent' or 'not_equivalent'. "
Original acc: 23.90%, attacked acc: 9.20%, dropped acc: 14.70%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Attacked prompt: b"s an istrument for question comparison evaluation, cosnider the questions and determine if their Teaning is the same, rysponding with 'equivalent' for similr euestions or 'not_equivalent' for different questidons. "
Original acc: 21.10%, attacked acc: 2.60%, dropped acc: 18.50%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Attacked prompt: b"IZn the role of a questoion similarity naalyzer, examin the meJaning of the provided questions and Uecide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for dSfferent questioXns. "
Original acc: 16.90%, attacked acc: 0.70%, dropped acc: 16.20%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Attacked prompt: b"Acting as a question equivalence instrument, determine if the Oprovided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones. "
Original acc: 12.50%, attacked acc: 7.40%, dropped acc: 5.10%
## stresstest
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 30.00%, attacked acc: 30.40%, dropped acc: -0.40%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent' and false is not true . "
Original acc: 24.10%, attacked acc: 30.10%, dropped acc: -6.00%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 23.90%, attacked acc: 27.70%, dropped acc: -3.80%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Attacked prompt: b"As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions and true is true and true is true and true is true and true is true and true is true . "
Original acc: 21.10%, attacked acc: 21.50%, dropped acc: -0.40%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Attacked prompt: b"In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions and true is true and true is true and true is true and true is true and true is true . "
Original acc: 16.90%, attacked acc: 22.40%, dropped acc: -5.50%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Attacked prompt: b"Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones and true is true and true is true and true is true and true is true and true is true . "
Original acc: 12.50%, attacked acc: 22.20%, dropped acc: -9.70%
## textbugger
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Please indicate if the later pair of penalties share the same meaning by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 30.00%, attacked acc: 2.00%, dropped acc: 28.00%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Analyze if the given setting of sente\xd5\xb8ces have the same connotation by replying with 'equivalent' or 'not_equivalent'. "
Original acc: 24.10%, attacked acc: 1.50%, dropped acc: 22.60%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Possible these two statemen\xf0\x9d\x9a\x9ds be considered equal in meaing? Answer with 'equivalent' or 'not_equivalent'. "
Original acc: 23.90%, attacked acc: 0.20%, dropped acc: 23.70%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Attacked prompt: b"Because an instrument for problem comparison evaluation, examining the subjects and defining if their \xef\xbd\x8deaning is the same, \xe2\xb2\x85esponding with 'equivalent' for similar questions or 'not_equivalent' for different questions. "
Original acc: 21.10%, attacked acc: 0.40%, dropped acc: 20.70%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Attacked prompt: b"I\xd5\xb8 the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different ques\xf0\x9d\x9a\x9dions. "
Original acc: 16.90%, attacked acc: 0.00%, dropped acc: 16.90%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Attacked prompt: b"Caretaker as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for \xd1\x95imilar ques\xf0\x9d\x9a\x9dions or 'not_equivalent' for dissimilar ones. "
Original acc: 12.50%, attacked acc: 0.00%, dropped acc: 12.50%
## textfooler
Original prompt: Please indicate if the following pair of sentences share the same meaning by responding with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Please specify if the henceforth matched of verdicts share the same significance by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 30.00%, attacked acc: 0.70%, dropped acc: 29.30%
Original prompt: Analyze if the given set of sentences have the same connotation by answering with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Analyzes if the submitted established of chastisement possess the same subtext by responding with 'equivalent' or 'not_equivalent'. "
Original acc: 24.10%, attacked acc: 2.20%, dropped acc: 21.90%
Original prompt: Can these two statements be considered equal in meaning? Answer with 'equivalent' or 'not_equivalent'.
Attacked prompt: b"Kan these two languages happen revisited equitable in signify? Meet with 'equivalent' or 'not_equivalent'. "
Original acc: 23.90%, attacked acc: 0.00%, dropped acc: 23.90%
Original prompt: As an instrument for question comparison evaluation, consider the questions and determine if their meaning is the same, responding with 'equivalent' for similar questions or 'not_equivalent' for different questions.
Attacked prompt: b"Iike an instrument for question comparison valuation, exams the questions and detects if their meaning is the same, reacting with 'equivalent' for similar questions or 'not_equivalent' for different themes. "
Original acc: 21.10%, attacked acc: 1.80%, dropped acc: 19.30%
Original prompt: In the role of a question similarity analyzer, examine the meaning of the provided questions and decide if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for different questions.
Attacked prompt: b"Onto the role of a question similarity analyzer, consider the significance of the envisaged matters and identify if they are equivalent, answering with 'equivalent' for equivalent questions or 'not_equivalent' for many matters. "
Original acc: 16.90%, attacked acc: 0.10%, dropped acc: 16.80%
Original prompt: Acting as a question equivalence instrument, determine if the provided questions are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for dissimilar ones.
Attacked prompt: b"Acting as a question equivalence utilities, assess if the presented queries are equivalent in meaning, answering with 'equivalent' for similar questions or 'not_equivalent' for different ones. "
Original acc: 12.50%, attacked acc: 1.40%, dropped acc: 11.10%
## translation
Language: chinese, acc: 2.90%, prompt: As a tool for determining the equivalence of problems, the problems are examined and their similarity is classified as 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 9.40%, prompt: Please compare the similarity of two sentences and put them into the category of 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 4.50%, prompt: Two sentences are classified as 'equivalent' or 'not_equivalent' for their similarity.
Language: chinese, acc: 1.70%, prompt: You can determine how similar the questions are by comparing them and categorizing them as 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 6.80%, prompt: Using the method of contrast, the similarity of these problems is divided into two categories: 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 3.60%, prompt: By comparing these issues, you can classify them as 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 10.20%, prompt: To determine whether the questions are similar, put them into the category of 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 17.80%, prompt: Divide the similarity of these questions into 'equivalent' or 'not_equivalent' categories.
Language: chinese, acc: 2.10%, prompt: Using the similarity assessment tool, these questions were classified as 'equivalent' or 'not_equivalent'.
Language: chinese, acc: 1.50%, prompt: By analyzing the similarity of these problems, they are divided into categories of 'equivalent' or 'not_equivalent'.
Language: french, acc: 3.60%, prompt: As a tool to determine the equivalence of questions, review the questions and rank their similarity as 'equivalent' or 'not_equivalent'.
Language: french, acc: 15.20%, prompt: Please compare the similarity of two sentences and classify them as 'equivalent' or 'not_equivalent'.
Language: french, acc: 7.00%, prompt: Based on the similarity of two sentences, classify them as 'equivalent' or 'not_equivalent'.
Language: french, acc: 2.10%, prompt: You can determine the similarity between these questions by comparing them and classifying them as 'equivalent' or 'not_equivalent'.
Language: french, acc: 16.10%, prompt: Use a comparative method to divide the similarity of these questions into two categories: 'equivalent' or 'not_equivalent'.
Language: french, acc: 3.80%, prompt: By comparing these questions, you can classify them as 'equivalent' or 'not_equivalent'.
Language: french, acc: 13.40%, prompt: Determine whether these questions are similar or not, and then classify them as 'equivalent' or 'not_equivalent'.
Language: french, acc: 20.80%, prompt: Divide the similarity of these questions into two categories: 'equivalent' or 'not_equivalent'.
Language: french, acc: 6.80%, prompt: Use a similarity assessment tool to classify these questions as 'equivalent' or 'not_equivalent'.
Language: french, acc: 5.60%, prompt: By analyzing the similarity of these questions, you can divide them into two categories: 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 10.10%, prompt: As a tool for determining an equation of questions, review the questions and classify their similarity as either 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 6.60%, prompt: When using questions in the classification domain, please classify the similarity between the questions as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 4.60%, prompt: To determine an equation of questions, you must review the questions and classify their similarity as 'equivalent' or 'not_equivalent'.
Language: arabic, acc: 6.50%, prompt: Questions can be classified as 'equivalent' or 'not_equivalent' when used to identify classifications.
Language: arabic, acc: 4.20%, prompt: Classification of question similarity as 'equivalent' or 'not_equivalent' is used as a tool to determine the classification of questions.
Language: arabic, acc: 18.10%, prompt: Classify the similarity of the questions as 'equivalent' or 'not_equivalent' to determine the equation of the questions.
Language: arabic, acc: 7.20%, prompt: Identifying the similarity of questions and classifying them as 'equivalent' or 'not_equivalent' is an important tool in determining the classification of questions.
Language: arabic, acc: 3.60%, prompt: When classifying questions, their similarity can be classified as 'equivalent' or 'not_equivalent' to determine the correct classification.
Language: arabic, acc: 4.10%, prompt: The similarity of questions should be classified as 'equivalent' or 'not_equivalent' when used to determine the equation of questions.
Language: arabic, acc: 3.90%, prompt: Identifying the similarity of questions and classifying them as 'equivalent' or 'not_equivalent' helps to correctly classify questions.
Language: spanish, acc: 2.10%, prompt: As a tool to determine the equivalence of questions, it reviews the questions and classifies their similarity as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 7.70%, prompt: Evaluate the similarity between questions and classify them as 'equivalent' or 'not_equivalent' to determine their equivalence.
Language: spanish, acc: 19.00%, prompt: Determine whether two questions are 'equivalent' or 'not_equivalent' based on similarity and characteristics.
Language: spanish, acc: 2.30%, prompt: Classifies the similarity between questions as 'equivalent' or 'not_equivalent' to determine their equivalence.
Language: spanish, acc: 6.10%, prompt: Review the questions and rate them as 'equivalent' or 'not_equivalent' based on their similarity and content.
Language: spanish, acc: 1.50%, prompt: As part of the classification task of questions, it determines their equivalence by categorizing their similarity as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 8.20%, prompt: Analyze the similarity between questions and classify them as 'equivalent' or 'not_equivalent' to determine their equivalence.
Language: spanish, acc: 2.50%, prompt: As a method of identifying the equivalence of questions, it categorizes their similarity as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 1.30%, prompt: To determine the equivalence between questions, check their similarity and classify them as 'equivalent' or 'not_equivalent'.
Language: spanish, acc: 18.10%, prompt: Classify the similarity between questions as 'equivalent' or 'not_equivalent' to determine whether they are equivalent or not.
Language: japanese, acc: 6.70%, prompt: As a tool to determine the equivalence of the question, review the question and categorize its similarities into 'equivalent' or 'not_equivalent' categories.
Language: japanese, acc: 0.50%, prompt: Work on text sorting tasks labeled 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 4.70%, prompt: For text classification tasks, use the labels 'equivalent' or 'not_equivalent' to determine the equivalence of statements.
Language: japanese, acc: 6.30%, prompt: In the MRPC dataset, use the labels 'equivalent' or 'not_equivalent' to classify the equivalence of statements.
Language: japanese, acc: 6.00%, prompt: As a tool for determining equivalence, check sentences and categorize them into 'equivalent' or 'not_equivalent' categories.
Language: japanese, acc: 11.30%, prompt: Use the labels 'equivalent' or 'not_equivalent' to determine the equivalence of statements in text classification tasks.
Language: japanese, acc: 8.30%, prompt: In the text classification task of the MRPC data set, classify the equivalence of statements with labels of 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 5.10%, prompt: As a tool to determine the equivalence of statements, categorize statements into 'equivalent' or 'not_equivalent' categories.
Language: japanese, acc: 10.40%, prompt: In a text classification task, classify the equivalence of statements using labels of 'equivalent' or 'not_equivalent'.
Language: japanese, acc: 6.60%, prompt: Do a text classification task to determine the equivalence of statements, labeled 'equivalent' or 'not_equivalent'.
Language: korean, acc: 11.20%, prompt: Classify two given sentences as 'equivalent' or 'not_equivalent' by discriminating whether they have the same meaning.
Language: korean, acc: 8.80%, prompt: Determine sentence equivalence by judging the similarity of two sentences with 'equivalent' or 'not_equivalent'.
Language: korean, acc: 4.30%, prompt: Classify the similarity of sentences as 'equivalent' or 'not_equivalent' by judging whether two sentences have the same meaning.
Language: korean, acc: 12.70%, prompt: Determine if two given sentences are equivalent to each other, and classify their similarity as 'equivalent' or 'not_equivalent'.
Language: korean, acc: 13.20%, prompt: Compare two given sentences to determine sentence equivalence, and classify their similarities as 'equivalent' or 'not_equivalent'.
Language: korean, acc: 8.60%, prompt: Classify sentence equivalence as 'equivalent' or 'not_equivalent' by judging whether two sentences have the same meaning to each other.
Language: korean, acc: 12.00%, prompt: Determine if two sentences have the same meaning, and classify their similarities as 'equivalent' or 'not_equivalent'.
Language: korean, acc: 12.00%, prompt: Compare two given sentences to determine their equivalence, and classify their similarities as 'equivalent' or 'not_equivalent'.
Language: korean, acc: 9.40%, prompt: Review two sentences to evaluate sentence equivalence, and classify their similarities as 'equivalent' or 'not_equivalent'.
Language: korean, acc: 9.90%, prompt: Judge whether two sentences have the same meaning to each other, and determine the sentence equivalence with 'equivalent' or 'not_equivalent'.
# rte
## 10 prompts
Acc: 47.65%, prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Acc: 45.13%, prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Acc: 41.16%, prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Acc: 40.43%, prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Acc: 37.91%, prompt: Indicate if the connection between the following sentences is 'entailment' or 'not_entailment'.
Acc: 32.85%, prompt: Review the two statements and categorize their relationship as either 'entailment' or 'not_entailment'.
Acc: 28.16%, prompt: Examine the pair of sentences and determine if they exhibit entailment or not_entailment. Answer with either 'entailment' or 'not_entailment'.
Acc: 27.80%, prompt: Identify whether the given pair of sentences demonstrates entailment or not_entailment. Answer with 'entailment' or 'not_entailment'.
Acc: 18.05%, prompt: Analyze the two provided sentences and decide if their relationship is 'entailment' or 'not_entailment'.
Acc: 10.47%, prompt: Considering the two sentences, identify if their relationship is 'entailment' or 'not_entailment'.
Acc: 5.42%, prompt: Are the following two sentences entailment or not_entailment? Answer me with "entailment" or "not_entailment", just one word.
Acc: 51.26%, prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Acc: 49.82%, prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Acc: 48.01%, prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Acc: 44.04%, prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Acc: 44.04%, prompt: In the capacity of an entailment assessment system, indicate if the link between the following sentences is 'entailment' or 'not_entailment':
Acc: 43.32%, prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Acc: 42.96%, prompt: Acting as an entailment detection instrument, determine if the given pair of sentences demonstrates entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Acc: 27.08%, prompt: As a tool for determining entailment relationships, review the two statements and categorize their connection as either 'entailment' or 'not_entailment':
Acc: 24.91%, prompt: As an instrument for entailment evaluation, consider the two sentences and determine if their relationship is 'entailment' or 'not_entailment'. Respond with 'entailment' or 'not_entailment':
Acc: 23.10%, prompt: While performing entailment analysis, classify the relationship between the provided sentences as 'entailment' or 'not_entailment':
## bertattack
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"Please approach the relationship between the included sentences as 'entailment' or 'not_entailment'."
Original acc: 47.65%, attacked acc: 0.00%, dropped acc: 47.65%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"i the relationship between the given phrases represent entailment or not_entailment? answer with 'entailment' or 'not_entailment'."
Original acc: 45.13%, attacked acc: 3.25%, dropped acc: 41.88%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"be if the provided pair of sentences displays entailment or not_entailment. approach with 'entailment' or 'not_entailment'."
Original acc: 41.16%, attacked acc: 1.81%, dropped acc: 39.35%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"you the role of a semantic relationship analyzeto, study the connection between the and statements and feel if they exhibit entailment or not_entailment. approach with 'entailment' or 'not_entailment':"
Original acc: 51.26%, attacked acc: 3.25%, dropped acc: 48.01%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"j as an entailment classifier, remember do the given pair of sentence displays entailment or not_entailment. responded with 'entailment' or 'not_entailment':"
Original acc: 49.82%, attacked acc: 2.17%, dropped acc: 47.65%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"she an entailment identification system, presented the connection between the following sentences and answered with 'entailment' or 'not_entailment':"
Original acc: 48.01%, attacked acc: 1.44%, dropped acc: 46.57%
## checklist
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment' vTAjHynoIG."
Original acc: 47.65%, attacked acc: 23.47%, dropped acc: 24.19%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment' 9vGXgnbWB8."
Original acc: 44.77%, attacked acc: 31.05%, dropped acc: 13.72%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' vTAjHynoIG."
Original acc: 41.16%, attacked acc: 1.81%, dropped acc: 39.35%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment' mzjVogkRhM:"
Original acc: 51.26%, attacked acc: 20.94%, dropped acc: 30.32%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' vTAjHynoIG:"
Original acc: 49.82%, attacked acc: 13.36%, dropped acc: 36.46%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment' RcRneWo6Iv:"
Original acc: 48.38%, attacked acc: 25.27%, dropped acc: 23.10%
## deepwordbug
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"Please classiyf the relationship between the provided sentencCs as 'entailment' or 'not_entailment'."
Original acc: 47.65%, attacked acc: 13.72%, dropped acc: 33.94%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"DoeGs the relationship between the give sentence represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'."
Original acc: 45.13%, attacked acc: 5.42%, dropped acc: 39.71%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"DetermiCne if the givePn pir of sentcnces disWlays entailment or not_entailment. Rtspond with 'entailment' or 'not_entailment'."
Original acc: 41.16%, attacked acc: 1.08%, dropped acc: 40.07%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"nI the rolQ of a semRantic relatiVonship analyzer, examine the coHnection between the igven sentences and deciEde if they exhibit entailment or not_entailment. AnswVr with 'entailment' or 'not_entailment':"
Original acc: 51.26%, attacked acc: 3.61%, dropped acc: 47.65%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"orking as an entailment clasifier, idenStify whethre the givec ptair of sentInces displays entailment or not_entailment. ReKspond with 'entailment' or 'not_entailment':"
Original acc: 49.82%, attacked acc: 1.08%, dropped acc: 48.74%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"sA an entailment identification ysstem, examien the lconnection between the follUowing sentences and frespond with 'entailment' or 'not_entailment':"
Original acc: 48.01%, attacked acc: 8.30%, dropped acc: 39.71%
## stresstest
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 47.65%, attacked acc: 23.47%, dropped acc: 24.19%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment' and false is not true ."
Original acc: 45.13%, attacked acc: 36.46%, dropped acc: 8.66%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' and false is not true ."
Original acc: 41.16%, attacked acc: 31.77%, dropped acc: 9.39%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment' and false is not true :"
Original acc: 51.26%, attacked acc: 41.88%, dropped acc: 9.39%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 49.82%, attacked acc: 47.29%, dropped acc: 2.53%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 48.01%, attacked acc: 40.07%, dropped acc: 7.94%
## textbugger
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"P\xe2\x85\xbcease \xcf\xb2lassify the relationship between the provided sentences as 'entailment' or 'not_entailment'."
Original acc: 47.65%, attacked acc: 6.14%, dropped acc: 41.52%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Ai the relationship between the given sentence represent entailment or not_entailment? Answer with 'entailment' or 'not_entailment'."
Original acc: 45.13%, attacked acc: 1.08%, dropped acc: 44.04%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determines if the awarded \xd1\x80air of sentence displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'."
Original acc: 41.16%, attacked acc: 3.25%, dropped acc: 37.91%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"Onto the role of a semantic relationship offing, e\xc3\x97amine the connection between the given se\xd5\xb8tences and dec\xd1\x96de if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':"
Original acc: 51.26%, attacked acc: 0.36%, dropped acc: 50.90%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Wo\xe2\xb2\x85king as an entailment classifier, detect whether the conferred pair of \xd1\x95entences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':"
Original acc: 49.82%, attacked acc: 3.97%, dropped acc: 45.85%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Oj an entailment identification system, examine the connection between the following sentences and resp\xd0\xbend with 'entailment' or 'not_entailment':"
Original acc: 48.01%, attacked acc: 15.88%, dropped acc: 32.13%
## textfooler
Original prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Attacked prompt: b"Invite classed the coitus between the foreseen priors as 'entailment' or 'not_entailment'."
Original acc: 47.65%, attacked acc: 0.72%, dropped acc: 46.93%
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Ca the relationship between the given sentences represent entailment or not_entailment? Answer with 'entailment' or 'not_entailment'."
Original acc: 45.13%, attacked acc: 4.69%, dropped acc: 40.43%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Establishes if the gave pair of phrases reflecting entailment or not_entailment. Reply with 'entailment' or 'not_entailment'."
Original acc: 41.16%, attacked acc: 5.78%, dropped acc: 35.38%
Original prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Attacked prompt: b"Onto the feature of a semantic link profiler, scrutinize the liaison between the offered chastisement and determining if they demonstrate entailment or not_entailment. Answer with 'entailment' or 'not_entailment':"
Original acc: 51.26%, attacked acc: 2.89%, dropped acc: 48.38%
Original prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Employment as an entailment classifier, ascertain whether the submitted pair of sentencing exhibits entailment or not_entailment. Respond with 'entailment' or 'not_entailment':"
Original acc: 49.82%, attacked acc: 2.17%, dropped acc: 47.65%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Because an entailment characterization regimes, considering the login between the following sentence and meet with 'entailment' or 'not_entailment':"
Original acc: 48.01%, attacked acc: 0.00%, dropped acc: 48.01%
## translation
Language: chinese, acc: 43.32%, prompt: In the light of an implication analysis tool, evaluate the relationship between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 41.52%, prompt: From the perspective of an implication analysis tool, determine whether there is an implication relationship in the following sentences by classifying them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 36.10%, prompt: Please use an implication analysis tool to determine whether an implication relationship exists in the following sentences by classifying them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 28.88%, prompt: Please evaluate the relation of the following sentences as 'entailment' or 'not_entailment' for the purpose of determining implication relation.
Language: chinese, acc: 40.43%, prompt: Please use the implication analysis tool to evaluate the relationships between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 32.85%, prompt: For the purpose of determining implicative relations, analyze the relations of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 42.60%, prompt: Please use the implication analysis tool to determine the relationship of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 28.88%, prompt: Please use the implication judgment tool to assess the relevance of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 20.22%, prompt: Please, with implication analysis as the main task, determine the relationships between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 35.38%, prompt: Using the implication judgment as a criterion, analyze the relation of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: french, acc: 40.07%, prompt: As an engagement analysis tool, evaluate the relationship between the given sentences and classify it as 'entailment' or 'not_entailment'.
Language: french, acc: 31.77%, prompt: Determine whether the given sentences involve one another or not as an implication analysis tool. Classify them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 38.99%, prompt: Using implication analysis, evaluate whether the sentences provided have a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 35.74%, prompt: As an engagement assessment tool, determine whether the sentences provided have a logical relationship and classify them as 'entailment' or 'not_entailment'.
Language: french, acc: 26.35%, prompt: As an implication classification tool, analyze the sentences provided to determine if there is a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 25.63%, prompt: Using implication analysis, determine whether the given sentences have a cause-effect relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 34.66%, prompt: Evaluate the relationship between the given sentences using implication analysis and rank them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 24.55%, prompt: As an engagement detection tool, determine whether the given sentences have a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 13.36%, prompt: Using implication analysis, evaluate whether the sentences provided have a cause-effect relationship and rank them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 14.80%, prompt: Determine whether the given sentences have a cause-effect relationship as an engagement analysis tool and categorize them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 36.82%, prompt: In your role as a tool for reasoning analysis, evaluate the relationship between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 45.85%, prompt: Can you determine whether this sentence is inferred from the other sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 32.49%, prompt: Using the tool of reasoning analysis, analyze the relationship between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 36.10%, prompt: Does this sentence represent a conclusion from the previous sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 32.85%, prompt: As a tool of reasoning analysis, evaluate the relationship of given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 36.46%, prompt: Can this sentence be inferred from the previous sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 32.49%, prompt: Using a tool to analyze a conclusion, analyze the relationship between the two sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 22.74%, prompt: Is this a conclusion from the next sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 23.83%, prompt: As part of your task in analyzing a conclusion, evaluate the relationship between the two sentences and classify them as 'entailment' or 'not_entailment' based on their relationship.
Language: arabic, acc: 24.55%, prompt: Are you following this sentence directly from the previous one? Classify it as 'entailment' or 'not_entailment'.
Language: spanish, acc: 40.79%, prompt: In your role as an implication analysis tool, evaluate the relationship between the given phrases and classify them as 'entailment' or 'not_entailment'.
Language: spanish, acc: 32.13%, prompt: Determine whether the second sentence necessarily implies the first and label the relation as 'entailment', or as 'not_entailment' if not.
Language: spanish, acc: 16.61%, prompt: Classifies the relationship between these two sentences as 'entailment' if one necessarily implies the other, or as 'not_entailment' if not.
Language: spanish, acc: 27.08%, prompt: Evaluates whether the information in the second sentence is implied in the first and labels the relationship as 'entailment', or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 37.55%, prompt: Given a couple of phrases, label their relationship as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 40.79%, prompt: Analyzes the relationship between the phrases and classifies them as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 46.93%, prompt: Given two sentences, determine whether the second sentence is a necessary consequence of the first and label the relation as 'entailment', or as 'not_entailment' if not.
Language: spanish, acc: 24.55%, prompt: Evaluates whether the information presented in the second sentence is implicit in the first and labels the relationship as 'entailment', or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 20.58%, prompt: Classifies the relationship between the given phrases as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 26.35%, prompt: Determines whether the information provided in the second sentence is necessarily inferable from the first and labels the relationship as 'entailment', or as 'not_entailment' if not.
Language: japanese, acc: 27.80%, prompt: Analyze the relationship of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 27.80%, prompt: Evaluate the semantic relationship of the sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 40.79%, prompt: Please judge the relationship between the given sentences and classify them as 'entailment' or 'not_entailment'.
Language: japanese, acc: 24.91%, prompt: Examine the paraphrases of a given sentence and classify them 'entailment' or 'not_entailment'.
Language: japanese, acc: 18.05%, prompt: Rate the similarity of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 18.41%, prompt: Determinate the semantic connections of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 22.02%, prompt: Examine the semantic match of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 5.42%, prompt: Classify it as 'entailment' or 'not_entailment' based on the content of the sentence.
Language: japanese, acc: 27.80%, prompt: Analyze the relationship of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 24.19%, prompt: Judge the semantic connections of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: korean, acc: 37.91%, prompt: Evaluate the relationship between any two sentences given to you and classify you as 'entailment' or 'not_entailment'.
Language: korean, acc: 27.44%, prompt: Analyze the semantic deductive relations between sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 28.16%, prompt: Evaluate the logical relevance between sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 35.02%, prompt: Evaluate the interaction of two given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 15.88%, prompt: Please check whether there is a semantic match between those two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 31.77%, prompt: Compare information between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 15.88%, prompt: Please analyse the correlation between those two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 20.22%, prompt: Evaluate the different meanings between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 20.58%, prompt: Compare the semantic structure of the two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 29.24%, prompt: Evaluate the interactions between sentences and classify them as 'entailment' or 'not_entailment'.
# sst2
## 10 prompts
Acc: 53.90%, prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Acc: 48.17%, prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Acc: 48.05%, prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Acc: 43.92%, prompt: Review this statement and decide whether it has a 'positive' or 'negative' sentiment:
Acc: 42.43%, prompt: Considering the given phrase, would you say it carries a 'positive' or 'negative' connotation?
Acc: 33.37%, prompt: Read the provided excerpt and choose between 'positive' and 'negative' to describe its sentiment:
Acc: 32.34%, prompt: Assess the mood of the following quote and determine if it's 'positive' or 'negative':
Acc: 23.85%, prompt: Given the context of this text, indicate if the emotion conveyed is 'positive' or 'negative':
Acc: 17.89%, prompt: Evaluate the sentiment of the given text and classify it as 'positive' or 'negative':
Acc: 13.42%, prompt: Determine the overall sentiment of this sentence, categorizing it as 'positive' or 'negative':
Acc: 42.55%, prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Acc: 40.94%, prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Acc: 37.27%, prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Acc: 36.35%, prompt: In the role of a sentiment analysis tool, respond with 'positive' or 'negative' to classify this statement:
Acc: 29.24%, prompt: As a sentiment classifier, determine whether the following text is 'positive' or 'negative'. Please classify:
Acc: 25.00%, prompt: Serving as a sentiment evaluation model, determine if the given statement is 'positive' or 'negative'. Classify:
Acc: 18.92%, prompt: Acting as a sentiment evaluator, identify if the given sentence is 'positive' or 'negative'. Classify:
Acc: 12.73%, prompt: In the capacity of a sentiment classifier, decide whether the given quote is 'positive' or 'negative'. Classify:
Acc: 3.10%, prompt: Emulating a sentiment classification system, indicate whether the provided text is 'positive' or 'negative'. Classify:
Acc: 2.98%, prompt: Taking on the role of an emotion classifier, specify if the provided phrase is 'positive' or 'negative'. Classify:
## bertattack
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"need refer the affect tone of this passage: 'positive' or 'negative'? "
Original acc: 53.90%, attacked acc: 3.21%, dropped acc: 50.69%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"After examination the given expression, type its emotion as of 'positive' or 'negative': "
Original acc: 48.17%, attacked acc: 0.57%, dropped acc: 47.59%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"we the tone of this statement and comment with either 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 4.47%, dropped acc: 43.58%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, ranks indicates if the below text is 'positive' or 'negative'. ranking: "
Original acc: 42.55%, attacked acc: 3.10%, dropped acc: 39.45%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Functioning as a flag identification tool, measure if the following expression is 'positive' or 'negative'. classifications: "
Original acc: 40.94%, attacked acc: 7.00%, dropped acc: 33.94%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"As an emotion indicating, score if the provided reply conveys a 'positive' or 'negative' sentiment. classify: "
Original acc: 37.27%, attacked acc: 0.46%, dropped acc: 36.81%
## checklist
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"Please identify the emotional tone of this passage: 'positive' or 'negative' 9vGXgnbWB8? "
Original acc: 53.90%, attacked acc: 0.00%, dropped acc: 53.90%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"After examining the following expression, label its emotion as either 'positive' or 'negative' PDCGfz6DL9: "
Original acc: 48.17%, attacked acc: 30.96%, dropped acc: 17.20%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"Analyze the tone of this statement and respond with either 'positive' or 'negative' 9vGXgnbWB8: "
Original acc: 48.05%, attacked acc: 0.69%, dropped acc: 47.36%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify zq0DcZ5dnI: "
Original acc: 42.55%, attacked acc: 4.70%, dropped acc: 37.84%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify EAB4KP2NVY: "
Original acc: 40.94%, attacked acc: 21.22%, dropped acc: 19.72%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify d6ZQ3u0GBQ: "
Original acc: 37.27%, attacked acc: 7.91%, dropped acc: 29.36%
## deepwordbug
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"Plase Sidentify the emotiLnal tone of this pasage: 'positive' or 'negative'? "
Original acc: 54.01%, attacked acc: 3.10%, dropped acc: 50.92%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"Analyze the otne of this stateXent and respond with eitJer 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 8.60%, dropped acc: 39.45%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"kfter eximining the followinM expression, labe its emolion as egther 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 0.80%, dropped acc: 47.25%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. CQlassify: "
Original acc: 42.66%, attacked acc: 39.11%, dropped acc: 3.56%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Functionnig as a sentiment identification tool, assess if the flolowing xepression is 'positive' or 'negative'. ClUssify: "
Original acc: 41.06%, attacked acc: 27.18%, dropped acc: 13.88%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"As an emoHtion Betector, determine if the provided passage convey a 'positive' or 'negative' sentiment. Classify: "
Original acc: 37.50%, attacked acc: 29.13%, dropped acc: 8.37%
## stresstest
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"Please identify the emotional tone of this passage: 'positive' or 'negative' and true is true and true is true and true is true and true is true and true is true ? "
Original acc: 53.90%, attacked acc: 17.89%, dropped acc: 36.01%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"After examining the following expression, label its emotion as either 'positive' or 'negative' and false is not true : "
Original acc: 48.17%, attacked acc: 34.86%, dropped acc: 13.30%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"Analyze the tone of this statement and respond with either 'positive' or 'negative' and true is true and true is true and true is true and true is true and true is true : "
Original acc: 48.05%, attacked acc: 43.35%, dropped acc: 4.70%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify and false is not true : "
Original acc: 42.55%, attacked acc: 42.66%, dropped acc: -0.11%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify and false is not true : "
Original acc: 40.94%, attacked acc: 36.47%, dropped acc: 4.47%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify and true is true and true is true and true is true and true is true and true is true : "
Original acc: 37.27%, attacked acc: 40.83%, dropped acc: -3.56%
## textbugger
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"Please i\xd4\x81entify the emotional tone of this p\xc9\x91ssage: 'positive' or 'negative'? "
Original acc: 54.01%, attacked acc: 11.12%, dropped acc: 42.89%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"Analyzes the tone of this pronouncements and answer with either 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 18.81%, dropped acc: 29.24%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"After examining the following expression, label its emo\xf0\x9d\x9a\x9dion as neither 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 2.06%, dropped acc: 45.99%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, invited indicate if the later text is 'positive' or 'negative'. Classi\xf0\x9d\x9a\x8fy: "
Original acc: 42.66%, attacked acc: 7.91%, dropped acc: 34.75%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Functioning as a \xd1\x95entiment identification tools, evaluating if the foll\xd0\xbewing expressions is 'positive' or 'negative'. Classi\xf0\x9d\x9a\x8fy: "
Original acc: 41.06%, attacked acc: 3.33%, dropped acc: 37.73%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"Because an emotion detector, determine if the povided passgae convey a 'positive' or 'negative' ambience. Classi\xf0\x9d\x9a\x8fy: "
Original acc: 37.50%, attacked acc: 1.03%, dropped acc: 36.47%
## textfooler
Original prompt: Please identify the emotional tone of this passage: 'positive' or 'negative'?
Attacked prompt: b"Invited specify the affective tone of this pass: 'positive' or 'negative'? "
Original acc: 53.90%, attacked acc: 5.85%, dropped acc: 48.05%
Original prompt: After examining the following expression, label its emotion as either 'positive' or 'negative':
Attacked prompt: b"After investigate the later demonstrations, label its sentiment as either 'positive' or 'negative': "
Original acc: 48.17%, attacked acc: 11.47%, dropped acc: 36.70%
Original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
Attacked prompt: b"Analyzes the ringtone of this declarations and answers with either 'positive' or 'negative': "
Original acc: 48.05%, attacked acc: 7.91%, dropped acc: 40.14%
Original prompt: Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Classify:
Attacked prompt: b"Working as a sentiment analyzer, please indicate if the following text is 'positive' or 'negative'. Ranked: "
Original acc: 42.55%, attacked acc: 3.21%, dropped acc: 39.33%
Original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
Attacked prompt: b"Operated as a sense id utility, estimate if the subsequently expressions is 'positive' or 'negative'. Classifies: "
Original acc: 40.94%, attacked acc: 4.82%, dropped acc: 36.12%
Original prompt: As an emotion detector, determine if the provided passage conveys a 'positive' or 'negative' sentiment. Classify:
Attacked prompt: b"As an emotion sensor, identifying if the provided cross conveyed a 'positive' or 'negative' sens. Classify: "
Original acc: 37.27%, attacked acc: 12.96%, dropped acc: 24.31%
## translation
Language: chinese, acc: 36.58%, prompt: Answer whether the statement is 'positive' or 'negative' based on sentiment analysis.
Language: chinese, acc: 17.55%, prompt: As an emotion analysis tool, determine whether the emotion in the text is 'positive' or 'negative'.
Language: chinese, acc: 38.76%, prompt: Categorize the statement as 'positive' or 'negative', based on its emotional bent.
Language: chinese, acc: 11.12%, prompt: Please use sentiment analysis to classify the text as 'positive' or 'negative'.
Language: chinese, acc: 14.91%, prompt: Please determine whether the emotion of the sentence is 'positive' or 'negative' and categorize it.
Language: chinese, acc: 25.80%, prompt: Using sentiment analysis, classify the text as 'positive' or 'negative'.
Language: chinese, acc: 21.90%, prompt: Please answer whether the emotion of the sentence is 'positive' or 'negative' and categorize it.
Language: chinese, acc: 34.75%, prompt: Categorize the statement as 'positive' or 'negative' based on sentiment analysis.
Language: chinese, acc: 13.65%, prompt: Please judge this text as 'positive' or 'negative' according to the criteria of sentiment analysis.
Language: chinese, acc: 45.53%, prompt: Please classify this sentence as 'positive' or 'negative' according to the principles of emotion analysis.
Language: french, acc: 38.07%, prompt: As a sentiment analysis tool, please answer with 'positive' or 'negative' to classify this statement.
Language: french, acc: 35.32%, prompt: Determine whether this phrase is 'positive' or 'negative' as a sentiment classification tool.
Language: french, acc: 30.28%, prompt: Identify the tone of this statement by choosing between 'positive' and 'negative' as a sentiment analysis tool.
Language: french, acc: 16.74%, prompt: Use sentiment analysis to classify this statement as 'positive' or 'negative'.
Language: french, acc: 26.83%, prompt: As a sentiment classification tool, please determine whether this statement is 'positive' or 'negative'.
Language: french, acc: 41.97%, prompt: Classify this sentence as 'positive' or 'negative' using sentiment analysis.
Language: french, acc: 29.93%, prompt: Choose between 'positive' or 'negative' to classify this statement as a sentiment analysis tool.
Language: french, acc: 27.64%, prompt: Identify the sentiment expressed in this statement by selecting 'positive' or 'negative' as a sentiment classification tool.
Language: french, acc: 43.58%, prompt: Determine whether this phrase is 'positive' or 'negative' using sentiment analysis as a classification tool.
Language: french, acc: 16.74%, prompt: Use sentiment analysis to classify this statement as 'positive' or 'negative'.
Language: arabic, acc: 34.75%, prompt: Under emotional analysis, answer 'positive' or 'negative' to classify this statement.
Language: arabic, acc: 33.60%, prompt: Does this statement express a 'positive' or 'negative' reaction?
Language: arabic, acc: 27.64%, prompt: Is that a 'positive' or a 'negative' phrase?
Language: arabic, acc: 28.21%, prompt: What is the classification between 'positive' and 'negative'?
Language: arabic, acc: 28.10%, prompt: Does this sentence express 'positive' or 'negative' feelings?
Language: arabic, acc: 41.06%, prompt: In the context of textual analysis, what classification is this phrase between 'positive' and 'negative'?
Language: arabic, acc: 32.00%, prompt: Could this be classified as 'positive' or 'negative'?
Language: arabic, acc: 44.27%, prompt: In the context of emotional analysis, what classification is this statement between 'positive' and 'negative'?
Language: arabic, acc: 32.80%, prompt: Can this be classified as 'positive' or 'negative'?
Language: arabic, acc: 29.13%, prompt: Under the classification of emotions, is this sentence 'positive' or 'negative'?
Language: spanish, acc: 34.52%, prompt: As a feeling analysis tool, classify this statement as 'positive' or 'negative'.
Language: spanish, acc: 33.26%, prompt: Determine whether this statement has a 'positive' or 'negative' connotation.
Language: spanish, acc: 50.34%, prompt: Indicate whether the following statement is 'positive' or 'negative'.
Language: spanish, acc: 38.53%, prompt: Evaluate whether this text has a 'positive' or 'negative' emotional charge.
Language: spanish, acc: 14.11%, prompt: According to your sentiment analysis, would you say this comment is 'positive' or 'negative'?
Language: spanish, acc: 16.97%, prompt: In the context of sentiment analysis, label this sentence as 'positive' or 'negative'.
Language: spanish, acc: 38.30%, prompt: Rate the following statement as 'positive' or 'negative', according to your sentiment analysis.
Language: spanish, acc: 19.04%, prompt: How would you classify this text in terms of its emotional tone? 'positive' or 'negative'?
Language: spanish, acc: 24.08%, prompt: As a tool for sentiment analysis, would you say this statement is 'positive' or 'negative'?
Language: spanish, acc: 40.60%, prompt: Classify this statement as 'positive' or 'negative', please.
Language: japanese, acc: 24.08%, prompt: Treat this sentence as an emotion analysis tool and categorize it as 'positive' and 'negative'.
Language: japanese, acc: 30.50%, prompt: Use this article as a sentiment analysis tool to classify 'positive' and 'negative'.
Language: japanese, acc: 41.28%, prompt: Use this sentence as an emotion analysis tool to determine whether it is 'positive' or 'negative'.
Language: japanese, acc: 30.28%, prompt: Use this sentence as an emotion analysis tool to classify 'positive' and 'negative'.
Language: japanese, acc: 32.80%, prompt: Use this sentence as a sentiment analysis tool and classify it as 'positive' or 'negative'.
Language: japanese, acc: 14.56%, prompt: To classify this sentence as 'positive' or 'negative', evaluate it as a sentiment analysis tool.
Language: japanese, acc: 35.78%, prompt: Treat this sentence as an emotion analysis tool to determine whether it is 'positive' or 'negative'.
Language: japanese, acc: 21.79%, prompt: Use this sentence as a sentiment analysis tool to classify 'positive' and 'negative'.
Language: japanese, acc: 40.14%, prompt: Analyze this sentence as an emotion analysis tool to classify whether it is 'positive' or 'negative'.
Language: japanese, acc: 36.35%, prompt: Use this sentence as an emotional analysis tool to determine whether it is 'positive' or 'negative'.
Language: korean, acc: 34.17%, prompt: As an emotional analysis tool, respond with 'positive' or 'negative' to classify these sentences.
Language: korean, acc: 39.79%, prompt: Classify this sentence as 'positive' if you regard it as positive, 'negative' if you regard it as negative.
Language: korean, acc: 9.29%, prompt: Please rate the emotion of this sentence and classify it as 'positive' or 'negative'.
Language: korean, acc: 46.79%, prompt: Classify this sentence as 'positive' if you perceive it positively and 'negative' if you perceive it negatively.
Language: korean, acc: 42.32%, prompt: If this is a sentence delivered using a positive expression, classify it as 'positive' and if this is a sentence delivered using a negative expression, classify it as 'negative'.
Language: korean, acc: 30.96%, prompt: Respond with 'positive' or 'negative' by categorizing whether the sentence is positive or negative.
Language: korean, acc: 46.90%, prompt: Please analyze the emotion in this sentence and classify it as 'positive' or 'negative'.
Language: korean, acc: 42.55%, prompt: Classify this sentence as 'positive' if it contains a positive meaning, 'negative' if it contains a negative meaning.
Language: korean, acc: 40.02%, prompt: Classify this sentence as 'positive' if it contains positive content, 'negative' if it contains negative content.
Language: korean, acc: 49.89%, prompt: Classify what you're trying to convey in this sentence as 'positive' if it's positive, and 'negative' if it's negative.
# wnli
## 10 prompts
Acc: 43.66%, prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Acc: 43.66%, prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Acc: 38.03%, prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Acc: 38.03%, prompt: Please classify the relationship between the provided sentences as 'entailment' or 'not_entailment'.
Acc: 33.80%, prompt: Examine the pair of sentences and determine if they exhibit entailment or not_entailment. Answer with either 'entailment' or 'not_entailment'.
Acc: 33.80%, prompt: Identify whether the given pair of sentences demonstrates entailment or not_entailment. Answer with 'entailment' or 'not_entailment'.
Acc: 33.80%, prompt: Indicate if the connection between the following sentences is 'entailment' or 'not_entailment'.
Acc: 28.17%, prompt: Review the two statements and categorize their relationship as either 'entailment' or 'not_entailment'.
Acc: 7.04%, prompt: Analyze the two provided sentences and decide if their relationship is 'entailment' or 'not_entailment'.
Acc: 7.04%, prompt: Considering the two sentences, identify if their relationship is 'entailment' or 'not_entailment'.
Acc: 4.23%, prompt: Are the following two sentences entailment or not_entailment? Answer me with "entailment" or "not_entailment", just one word.
Acc: 43.66%, prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: Acting as an entailment detection instrument, determine if the given pair of sentences demonstrates entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: In the capacity of an entailment assessment system, indicate if the link between the following sentences is 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: Working as an entailment classifier, identify whether the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment':
Acc: 43.66%, prompt: In the role of a semantic relationship analyzer, examine the connection between the given sentences and decide if they exhibit entailment or not_entailment. Answer with 'entailment' or 'not_entailment':
Acc: 39.44%, prompt: As an instrument for entailment evaluation, consider the two sentences and determine if their relationship is 'entailment' or 'not_entailment'. Respond with 'entailment' or 'not_entailment':
Acc: 32.39%, prompt: As a tool for determining entailment relationships, review the two statements and categorize their connection as either 'entailment' or 'not_entailment':
Acc: 23.94%, prompt: While performing entailment analysis, classify the relationship between the provided sentences as 'entailment' or 'not_entailment':
## bertattack
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"i the relationship between the given terms represent entailment or not_entailment? reply with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 18.31%, dropped acc: 25.35%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determine if the first pair of sentences demonstrate entailment or not_entailment. replied with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the connection between the after expressions and classify it as 'entailment' or 'not_entailment'."
Original acc: 38.03%, attacked acc: 9.86%, dropped acc: 28.17%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"In your role as an entailment analysis tool, assessing the relationship between the formed sentences and classify it as 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"is an entailment identification system, examine the connection between the before sentence and respond with 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 1.41%, dropped acc: 42.25%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyze the existing sentence and see if their closeness is 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 1.41%, dropped acc: 42.25%
## checklist
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment' MSAddJ4D2a."
Original acc: 43.66%, attacked acc: 33.80%, dropped acc: 9.86%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' Vq3aBzuZcD."
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment' AMsRIKZniY."
Original acc: 38.03%, attacked acc: 8.45%, dropped acc: 29.58%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment' vTAjHynoIG:"
Original acc: 43.66%, attacked acc: 2.82%, dropped acc: 40.85%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment' yh912BU8T0:"
Original acc: 43.66%, attacked acc: 12.68%, dropped acc: 30.99%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment' 9vGXgnbWB8:"
Original acc: 43.66%, attacked acc: 2.82%, dropped acc: 40.85%
## deepwordbug
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Deos the relationship between the gDiven sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 18.31%, dropped acc: 25.35%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"VDetermine if the giPen Apair of setences displUays entailment or not_entailment. QRespond with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the fconnection between the following sentenecs and classify it as 'entailment' or 'not_entailment'."
Original acc: 38.03%, attacked acc: 16.90%, dropped acc: 21.13%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"kn your rohe as an entailment analysis tool, Zassess the relationship between the gUiven saentences and clasFsify it as 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"sA an entailment identification ysstem, examine the connection between the folBowing qsentences and respond with 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 9.86%, dropped acc: 33.80%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Functioning as an entailment evaluation tOool, anatlyze the ppovided sentences and deWide if their relationsVhip is 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
## stresstest
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment' and false is not true ."
Original acc: 43.66%, attacked acc: 30.99%, dropped acc: 12.68%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 43.66%, attacked acc: 32.39%, dropped acc: 11.27%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 38.03%, attacked acc: 14.08%, dropped acc: 23.94%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 43.66%, attacked acc: 39.44%, dropped acc: 4.23%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 43.66%, attacked acc: 33.80%, dropped acc: 9.86%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment' and true is true and true is true and true is true and true is true and true is true :"
Original acc: 43.66%, attacked acc: 32.39%, dropped acc: 11.27%
## textbugger
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Deos the relationship between the given sentences represent entailment or not_entailment? Answering with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 1.41%, dropped acc: 42.25%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Determ\xd1\x96ne if the given pair of sentencing displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 18.31%, dropped acc: 25.35%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the connect\xd1\x96on between the following sentence and classify it as 'entailment' or 'not_entailment'."
Original acc: 38.03%, attacked acc: 9.86%, dropped acc: 28.17%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"During your ro\xe2\x85\xbce as an entailment analysis tool, evaluating the relationship between the given sentences and classifying it as 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"A\xd1\x95 an entailment identification systems, examine the connection between the \xf0\x9d\x9a\x8following sentence and resp ond with 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Functioning as an entailment evaluation tool, analyzes the provided sentences and decid\xd0\xb5 if their relationship is 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
## textfooler
Original prompt: Does the relationship between the given sentences represent entailment or not_entailment? Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Ca the relationship between the given sentences are entailment or not_entailment? Answer with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 2.82%, dropped acc: 40.85%
Original prompt: Determine if the given pair of sentences displays entailment or not_entailment. Respond with 'entailment' or 'not_entailment'.
Attacked prompt: b"Define if the given pair of priors exhibits entailment or not_entailment. Answers with 'entailment' or 'not_entailment'."
Original acc: 43.66%, attacked acc: 1.41%, dropped acc: 42.25%
Original prompt: Assess the connection between the following sentences and classify it as 'entailment' or 'not_entailment'.
Attacked prompt: b"Assess the attaching between the afterward judging and classify it as 'entailment' or 'not_entailment'."
Original acc: 38.03%, attacked acc: 2.82%, dropped acc: 35.21%
Original prompt: In your role as an entailment analysis tool, assess the relationship between the given sentences and classify it as 'entailment' or 'not_entailment':
Attacked prompt: b"Onto your role as an entailment analysing tool, appraisals the relationship between the given condemnation and classed it as 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: As an entailment identification system, examine the connection between the following sentences and respond with 'entailment' or 'not_entailment':
Attacked prompt: b"Because an entailment identification plans, examine the connection between the later conviction and meet with 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
Original prompt: Functioning as an entailment evaluation tool, analyze the provided sentences and decide if their relationship is 'entailment' or 'not_entailment':
Attacked prompt: b"Operate as an entailment evaluation tool, analyze the submitted sentences and choices if their relationship is 'entailment' or 'not_entailment':"
Original acc: 43.66%, attacked acc: 0.00%, dropped acc: 43.66%
## translation
Language: chinese, acc: 42.25%, prompt: In the light of an implication analysis tool, evaluate the relationship between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 38.03%, prompt: From the perspective of an implication analysis tool, determine whether there is an implication relationship in the following sentences by classifying them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 33.80%, prompt: Please use an implication analysis tool to determine whether an implication relationship exists in the following sentences by classifying them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 23.94%, prompt: Please evaluate the relation of the following sentences as 'entailment' or 'not_entailment' for the purpose of determining implication relation.
Language: chinese, acc: 35.21%, prompt: Please use the implication analysis tool to evaluate the relationships between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 28.17%, prompt: For the purpose of determining implicative relations, analyze the relations of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 43.66%, prompt: Please use the implication analysis tool to determine the relationship of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 19.72%, prompt: Please use the implication judgment tool to assess the relevance of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 16.90%, prompt: Please, with implication analysis as the main task, determine the relationships between the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: chinese, acc: 25.35%, prompt: Using the implication judgment as a criterion, analyze the relation of the following sentences and classify them as 'entailment' or 'not_entailment'.
Language: french, acc: 35.21%, prompt: As an engagement analysis tool, evaluate the relationship between the given sentences and classify it as 'entailment' or 'not_entailment'.
Language: french, acc: 21.13%, prompt: Determine whether the given sentences involve one another or not as an implication analysis tool. Classify them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 23.94%, prompt: Using implication analysis, evaluate whether the sentences provided have a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 23.94%, prompt: As an engagement assessment tool, determine whether the sentences provided have a logical relationship and classify them as 'entailment' or 'not_entailment'.
Language: french, acc: 23.94%, prompt: As an implication classification tool, analyze the sentences provided to determine if there is a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 29.58%, prompt: Using implication analysis, determine whether the given sentences have a cause-effect relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 39.44%, prompt: Evaluate the relationship between the given sentences using implication analysis and rank them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 19.72%, prompt: As an engagement detection tool, determine whether the given sentences have a logical relationship and categorize them as 'entailment' or 'not_entailment'.
Language: french, acc: 18.31%, prompt: Using implication analysis, evaluate whether the sentences provided have a cause-effect relationship and rank them accordingly as 'entailment' or 'not_entailment'.
Language: french, acc: 5.63%, prompt: Determine whether the given sentences have a cause-effect relationship as an engagement analysis tool and categorize them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 33.80%, prompt: In your role as a tool for reasoning analysis, evaluate the relationship between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 39.44%, prompt: Can you determine whether this sentence is inferred from the other sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 28.17%, prompt: Using the tool of reasoning analysis, analyze the relationship between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 39.44%, prompt: Does this sentence represent a conclusion from the previous sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 25.35%, prompt: As a tool of reasoning analysis, evaluate the relationship of given sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 43.66%, prompt: Can this sentence be inferred from the previous sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 32.39%, prompt: Using a tool to analyze a conclusion, analyze the relationship between the two sentences and classify them as 'entailment' or 'not_entailment'.
Language: arabic, acc: 35.21%, prompt: Is this a conclusion from the next sentence? Classify it as 'entailment' or 'not_entailment'.
Language: arabic, acc: 33.80%, prompt: As part of your task in analyzing a conclusion, evaluate the relationship between the two sentences and classify them as 'entailment' or 'not_entailment' based on their relationship.
Language: arabic, acc: 28.17%, prompt: Are you following this sentence directly from the previous one? Classify it as 'entailment' or 'not_entailment'.
Language: spanish, acc: 36.62%, prompt: In your role as an implication analysis tool, evaluate the relationship between the given phrases and classify them as 'entailment' or 'not_entailment'.
Language: spanish, acc: 40.85%, prompt: Determine whether the second sentence necessarily implies the first and label the relation as 'entailment', or as 'not_entailment' if not.
Language: spanish, acc: 14.08%, prompt: Classifies the relationship between these two sentences as 'entailment' if one necessarily implies the other, or as 'not_entailment' if not.
Language: spanish, acc: 15.49%, prompt: Evaluates whether the information in the second sentence is implied in the first and labels the relationship as 'entailment', or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 32.39%, prompt: Given a couple of phrases, label their relationship as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 33.80%, prompt: Analyzes the relationship between the phrases and classifies them as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 40.85%, prompt: Given two sentences, determine whether the second sentence is a necessary consequence of the first and label the relation as 'entailment', or as 'not_entailment' if not.
Language: spanish, acc: 21.13%, prompt: Evaluates whether the information presented in the second sentence is implicit in the first and labels the relationship as 'entailment', or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 18.31%, prompt: Classifies the relationship between the given phrases as 'entailment' if one necessarily implies the other, or as 'not_entailment' if there is no such implication.
Language: spanish, acc: 19.72%, prompt: Determines whether the information provided in the second sentence is necessarily inferable from the first and labels the relationship as 'entailment', or as 'not_entailment' if not.
Language: japanese, acc: 12.68%, prompt: Analyze the relationship of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 14.08%, prompt: Evaluate the semantic relationship of the sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 33.80%, prompt: Please judge the relationship between the given sentences and classify them as 'entailment' or 'not_entailment'.
Language: japanese, acc: 16.90%, prompt: Examine the paraphrases of a given sentence and classify them 'entailment' or 'not_entailment'.
Language: japanese, acc: 16.90%, prompt: Rate the similarity of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 19.72%, prompt: Determinate the semantic connections of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 14.08%, prompt: Examine the semantic match of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 26.76%, prompt: Classify it as 'entailment' or 'not_entailment' based on the content of the sentence.
Language: japanese, acc: 12.68%, prompt: Analyze the relationship of a given sentence and classify it as 'entailment' or 'not_entailment'.
Language: japanese, acc: 22.54%, prompt: Judge the semantic connections of a given sentence and categorize it as 'entailment' or 'not_entailment'.
Language: korean, acc: 25.35%, prompt: Evaluate the relationship between any two sentences given to you and classify you as 'entailment' or 'not_entailment'.
Language: korean, acc: 18.31%, prompt: Analyze the semantic deductive relations between sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 12.68%, prompt: Evaluate the logical relevance between sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 36.62%, prompt: Evaluate the interaction of two given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 7.04%, prompt: Please check whether there is a semantic match between those two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 19.72%, prompt: Compare information between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 11.27%, prompt: Please analyse the correlation between those two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 11.27%, prompt: Evaluate the different meanings between given sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 15.49%, prompt: Compare the semantic structure of the two sentences and classify them as 'entailment' or 'not_entailment'.
Language: korean, acc: 9.86%, prompt: Evaluate the interactions between sentences and classify them as 'entailment' or 'not_entailment'.
# mmlu
## 10 prompts
Acc: 27.07%, prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Acc: 25.67%, prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Acc: 25.09%, prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Acc: 24.85%, prompt: In relation to the multiple-choice question on {}, please provide the accurate answer by choosing 'A', 'B', 'C', or 'D'.
Acc: 23.69%, prompt: For the multiple-choice question related to {}, please choose the most accurate answer from 'A', 'B', 'C', or 'D'.
Acc: 22.29%, prompt: Regarding the following multiple-choice question on {}, pick the correct answer from the options 'A', 'B', 'C', or 'D'.
Acc: 21.82%, prompt: Answer the following multiple-choice question about {} by selecting the correct option: 'A', 'B', 'C', or 'D'.
Acc: 21.59%, prompt: Below are multiple-choice question concerning {}. Indicate your response with 'A', 'B', 'C', or 'D'.
Acc: 18.67%, prompt: Examine the following question based on {} and choose the correct response from 'A', 'B', 'C', or 'D'.
Acc: 16.10%, prompt: Please respond to the multiple-choice question about {} by selecting the appropriate answer: 'A', 'B', 'C', or 'D'.
Acc: 25.09%, prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Acc: 24.85%, prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Acc: 24.39%, prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Acc: 23.10%, prompt: Drawing upon your mastery of {}, please answer the multiple-choice question by selecting the correct option from 'A', 'B', 'C', or 'D'.
Acc: 22.75%, prompt: Given your proficiency in {}, please answer the subsequent multiple-choice question with 'A', 'B', 'C', or 'D'.
Acc: 22.05%, prompt: Utilizing your expertise in {}, answer the following multiple-choice question by picking 'A', 'B', 'C', or 'D'.
Acc: 21.94%, prompt: With your understanding of {}, kindly answer the subsequent multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Acc: 21.59%, prompt: As a skilled person in the field of {}, please respond to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Acc: 19.37%, prompt: Considering your familiarity with {}, attend to the following multiple-choice question by picking 'A', 'B', 'C', or 'D'.
Acc: 14.12%, prompt: As someone well-versed in {}, please address the multiple-choice question below by selecting 'A', 'B', 'C', or 'D'.
## bertattack
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"a each multiple-a instruction about {}, guess the accurate answer by selecting 'A', 'B', 'C', or 'D'. "
Original acc: 27.07%, attacked acc: 16.34%, dropped acc: 10.74%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Evaluate the multiple-r asked about {} and select the most better sentence from 'A', 'B', 'C', or 'D'. "
Original acc: 25.67%, attacked acc: 12.14%, dropped acc: 13.54%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the next multiple-s question about {} by picking the incorrect button or 'A', 'B', 'C', or 'D'. "
Original acc: 25.09%, attacked acc: 5.02%, dropped acc: 20.07%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As a knowledgeaware individual in {}, state your response to the multiple-guessing round by choosing 'A', 'B', 'C', or 'D'."
Original acc: 25.09%, attacked acc: 9.68%, dropped acc: 15.40%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, answer to the following multiple-guessing contestant by guessing 'A', 'B', 'C', or 'D'."
Original acc: 24.85%, attacked acc: 6.88%, dropped acc: 17.97%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your understanding of {}, tackle the forth multiple-asked query by choosing 'A', 'B', 'C', or 'D'."
Original acc: 24.39%, attacked acc: 8.28%, dropped acc: 16.10%
## checklist
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D' iia2yL9Uzm. "
Original acc: 27.07%, attacked acc: 14.00%, dropped acc: 13.07%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D' iia2yL9Uzm. "
Original acc: 25.67%, attacked acc: 10.74%, dropped acc: 14.94%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D' iia2yL9Uzm. "
Original acc: 25.09%, attacked acc: 13.89%, dropped acc: 11.20%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D' iia2yL9Uzm."
Original acc: 25.09%, attacked acc: 11.79%, dropped acc: 13.30%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D' iia2yL9Uzm."
Original acc: 24.85%, attacked acc: 13.42%, dropped acc: 11.44%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D' iia2yL9Uzm."
Original acc: 24.39%, attacked acc: 11.09%, dropped acc: 13.30%
## deepwordbug
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"For each multiple-chRoice qusetion about {}, cdentify the corret ansKwer by seleting 'A', 'B', 'C', or 'D'. "
Original acc: 27.07%, attacked acc: 15.87%, dropped acc: 11.20%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Evaluate the multple-choice question about {} and elect the most fitting respone from 'A', 'B', 'C', or 'D'. "
Original acc: 25.67%, attacked acc: 14.70%, dropped acc: 10.97%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the subsequent multiple-choicSe qustion about {} by picking the right option among 'A', 'B', 'C', or 'D'. "
Original acc: 25.09%, attacked acc: 19.02%, dropped acc: 6.07%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As a kAnowledgeable indviidual in {}, provide your ersponse to the multipole-choice quesxion by choosing 'A', 'B', 'C', or 'D'."
Original acc: 25.09%, attacked acc: 16.45%, dropped acc: 8.63%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, respnod to the following muxltiple-choice questoin by selecting 'A', 'B', 'C', or 'D'."
Original acc: 24.85%, attacked acc: 19.60%, dropped acc: 5.25%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your knowledge of {}, tackle the fololwing multpile-choice question by choosing 'A', 'B', 'C', or 'D'."
Original acc: 24.39%, attacked acc: 20.30%, dropped acc: 4.08%
## stresstest
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 27.07%, attacked acc: 21.94%, dropped acc: 5.13%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 25.67%, attacked acc: 20.19%, dropped acc: 5.48%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true . "
Original acc: 25.09%, attacked acc: 19.49%, dropped acc: 5.60%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 25.09%, attacked acc: 17.39%, dropped acc: 7.70%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 24.85%, attacked acc: 13.19%, dropped acc: 11.67%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D' and true is true and true is true and true is true and true is true and true is true ."
Original acc: 24.39%, attacked acc: 12.72%, dropped acc: 11.67%
## textbugger
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"For each mul\xf0\x9d\x9a\x9diple-choice problem about {}, identify the corrects answer by choose 'A', 'B', 'C', or 'D'. "
Original acc: 27.07%, attacked acc: 17.27%, dropped acc: 9.80%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Assess the mul\xf0\x9d\x9a\x9diple-choice question about {} and se\xe2\x85\xbcect the most fitting response from 'A', 'B', 'C', or 'D'. "
Original acc: 25.67%, attacked acc: 9.92%, dropped acc: 15.75%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the subs\xd0\xb5quent mu\xe2\x85\xbctiple-choice question about {} by pick the right alternative among 'A', 'B', 'C', or 'D'. "
Original acc: 25.09%, attacked acc: 18.67%, dropped acc: 6.42%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"A\xd1\x95 a informed individual in {}, provide your replies to the multiple-c\xd5\xb0oice ques\xf0\x9d\x9a\x9dion by choosing 'A', 'B', 'C', or 'D'."
Original acc: 25.09%, attacked acc: 13.30%, dropped acc: 11.79%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, answering to the \xf0\x9d\x9a\x8following multiple-c\xd5\xb0oice \xd4\x9buestion by selecting 'A', 'B', 'C', or 'D'."
Original acc: 24.85%, attacked acc: 11.67%, dropped acc: 13.19%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your knowledge of {}, tackle the \xf0\x9d\x9a\x8following multiple-cho\xd1\x96ce problem by choosing 'A', 'B', 'C', or 'D'."
Original acc: 24.39%, attacked acc: 9.57%, dropped acc: 14.82%
## textfooler
Original prompt: For each multiple-choice question about {}, identify the correct answer by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Among all multiple-choice subject about {}, find the accurate answer by chooses 'A', 'B', 'C', or 'D'. "
Original acc: 27.07%, attacked acc: 12.72%, dropped acc: 14.35%
Original prompt: Evaluate the multiple-choice question about {} and select the most fitting response from 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Evaluate the multiple-choice tura about {} and decide the most fitting replying from 'A', 'B', 'C', or 'D'. "
Original acc: 25.67%, attacked acc: 15.29%, dropped acc: 10.39%
Original prompt: Answer the subsequent multiple-choice question about {} by picking the right option among 'A', 'B', 'C', or 'D'.
Attacked prompt: b"Answer the next multiple-choice problem about {} by picking the right opt among 'A', 'B', 'C', or 'D'. "
Original acc: 25.09%, attacked acc: 15.17%, dropped acc: 9.92%
Original prompt: As a knowledgeable individual in {}, provide your response to the multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As a seasoned individual in {}, provide your response to the multiple-choice things by choosing 'A', 'B', 'C', or 'D'."
Original acc: 25.09%, attacked acc: 14.12%, dropped acc: 10.97%
Original prompt: As an expert in {}, respond to the following multiple-choice question by selecting 'A', 'B', 'C', or 'D'.
Attacked prompt: b"As an expert in {}, answering to the following multiple-choice things by picking 'A', 'B', 'C', or 'D'."
Original acc: 24.85%, attacked acc: 13.89%, dropped acc: 10.97%
Original prompt: With your knowledge of {}, tackle the following multiple-choice question by choosing 'A', 'B', 'C', or 'D'.
Attacked prompt: b"With your knowledgeable of {}, adress the following multiple-choice things by picking 'A', 'B', 'C', or 'D'."
Original acc: 24.39%, attacked acc: 15.40%, dropped acc: 8.98%
## translation
Language: chinese, acc: 21.63%, prompt: For the following multiple choice question about {}, please select the correct answer from 'A', 'B', 'C' or 'D'.
Language: chinese, acc: 28.55%, prompt: Please select the correct answer from 'A', 'B', 'C' or 'D' for the following multiple choice question for {}.
Language: chinese, acc: 30.50%, prompt: Select the correct answer from 'A', 'B', 'C' or 'D' for the following multiple choice question {}.
Language: chinese, acc: 22.16%, prompt: Please choose the correct answer from 'A', 'B', 'C' or 'D' according to the following multiple-choice questions about {}.
Language: chinese, acc: 20.39%, prompt: Please select the correct answer from 'A', 'B', 'C' or 'D' for the {} multiple choice questions below.
Language: chinese, acc: 19.86%, prompt: The following is A multiple choice question about {}. Please select the correct answer from 'A', 'B', 'C' or 'D'.
Language: chinese, acc: 28.55%, prompt: Please select the correct answer from 'A', 'B', 'C' or 'D' for the following multiple choice question {}.
Language: chinese, acc: 22.16%, prompt: Please choose the correct answer from 'A', 'B', 'C' or 'D' according to the following multiple-choice questions about {}.
Language: chinese, acc: 23.40%, prompt: Please select the correct answer from 'A', 'B', 'C' or 'D' for the following multiple choice questions about {}.
Language: chinese, acc: 23.40%, prompt: Please select the correct answer from 'A', 'B', 'C' or 'D' for the following multiple choice questions about {}.
Language: french, acc: 21.28%, prompt: For the following multiple choice question on {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: french, acc: 22.34%, prompt: This is a multiple choice question about {}. Select the correct answer from options 'A', 'B', 'C' or 'D'.
Language: french, acc: 23.23%, prompt: In the context of the multiple-choice question on {}, identify the correct answer from options 'A', 'B', 'C' or 'D'.
Language: french, acc: 18.26%, prompt: About the following question on {}, determine the correct answer from the choices 'A', 'B', 'C' or 'D'.
Language: french, acc: 20.92%, prompt: Carefully review the multiple-choice question regarding {}. Choose the correct answer from options 'A', 'B', 'C', or 'D'.
Language: french, acc: 25.53%, prompt: For the multiple-choice question for {}, indicate the correct answer from options 'A', 'B', 'C', or 'D'.
Language: french, acc: 20.39%, prompt: The next question is about {}. Select the correct answer from the choices 'A', 'B', 'C' or 'D'.
Language: french, acc: 23.05%, prompt: As part of the multiple-choice question on {}, choose the appropriate answer from options 'A', 'B', 'C' or 'D'.
Language: french, acc: 18.26%, prompt: Rate your understanding of the multiple-choice question on {}. Choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: french, acc: 25.71%, prompt: Analyze the following multiple-choice question on {}. Identify the correct answer among choices 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 23.23%, prompt: For the multiple choice question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 22.87%, prompt: For the following multiple-choice question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 21.99%, prompt: For the following multiple choice question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 23.58%, prompt: When it comes to the multiple-choice question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 25.00%, prompt: For the multiple-choice question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 19.50%, prompt: If the question for {} is multiple choice, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 21.28%, prompt: For the question regarding {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 21.10%, prompt: For the question about {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 20.21%, prompt: When it comes to the question regarding {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: arabic, acc: 21.28%, prompt: For the question regarding {}, choose the correct answer from options 'A', 'B', 'C' or 'D'.
Language: spanish, acc: 25.53%, prompt: For the following multiple-choice question about {}, choose the correct answer from 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 25.89%, prompt: For the following multiple-choice question about {}, select the correct answer from 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 25.53%, prompt: For the following multiple-choice question about {}, choose the correct answer from 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 25.35%, prompt: Within the context of the following multiple-choice question about {}, choose the correct option from 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 25.00%, prompt: For the following multiple-choice statement about {}, select the correct answer from 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 19.33%, prompt: Considering the following multiple-choice question about {}, mark the correct answer with 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 22.87%, prompt: For the following multiple-choice question about {}, choose the correct alternative among 'A', 'B', 'C' or 'D'.
Language: spanish, acc: 24.47%, prompt: For the following multiple-choice statement about {}, choose the correct option from alternatives 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 27.13%, prompt: Within the context of the following multiple-choice question about {}, select the correct answer from alternatives 'A', 'B', 'C', or 'D'.
Language: spanish, acc: 20.57%, prompt: Considering the following multiple-choice statement about {}, mark the correct alternative with the options 'A', 'B', 'C' or 'D'.
Language: japanese, acc: 21.28%, prompt: Choose the appropriate answer from options 'A', 'B', 'C', or 'D' for {} regarding the following question.
Language: japanese, acc: 24.29%, prompt: Choose the correct answer from 'A', 'B', 'C', or 'D' for the following multiple-choice question about {}.
Language: japanese, acc: 25.71%, prompt: For the following multiple-choice questions about {}, choose the correct answer from 'A', 'B', 'C', or 'D'.
Language: japanese, acc: 21.28%, prompt: Choose the correct answer from options 'A', 'B', 'C', or 'D' for the following questions about {}.
Language: japanese, acc: 19.86%, prompt: In the multiple choice questions below, choose the correct answer for {} from 'A', 'B', 'C', or 'D'.
Language: japanese, acc: 20.57%, prompt: Choose the correct answer from the options 'A', 'B', 'C', or 'D' for the following questions about {}.
Language: japanese, acc: 19.86%, prompt: In the multiple choice questions below, choose the correct answer for {} from 'A', 'B', 'C', or 'D'.
Language: japanese, acc: 22.52%, prompt: Choose the correct answer from 'A', 'B', 'C', or 'D' for the following multiple choice questions about {}.
Language: japanese, acc: 19.86%, prompt: In the multiple choice questions below, choose the correct answer for {} from 'A', 'B', 'C', or 'D'.
Language: japanese, acc: 21.99%, prompt: Choose the correct answer from options 'A', 'B', 'C', or 'D' for {} regarding the following question.
Language: korean, acc: 18.09%, prompt: For the multiple choice problem about, choose the correct answer for '{}' from 'A', 'B', 'C', or 'D'.
Language: korean, acc: 28.37%, prompt: Choose the correct answer for '{}' from 'A', 'B', 'C', or 'D' in the multiple choice problem involving,
Language: korean, acc: 21.99%, prompt: For the multiple choice problem below, choose the correct answer to '{}' from 'A', 'B', 'C', or 'D'.
Language: korean, acc: 24.82%, prompt: In the following multiple-choice problem, choose the correct answer for '{}' from 'A', 'B', 'C', or 'D'.
Language: korean, acc: 24.47%, prompt: For the following multiple choice problem, choose the correct answer for '{}' from 'A', 'B', 'C', or 'D'.
Language: korean, acc: 36.52%, prompt: Solve multiple choice problems about: Which of 'A', 'B', 'C', or 'D' is the correct answer for '{}'.
Language: korean, acc: 19.68%, prompt: Choose the correct answer to the multiple-choice question below. Is '{}' an 'A', 'B', 'C', or 'D'.
Language: korean, acc: 23.40%, prompt: Solve the following multiple-choice problem. Choose the correct answer for '{}' from 'A', 'B', 'C', or 'D'.
Language: korean, acc: 26.42%, prompt: Choose the correct answer to the following multiple choice problem: Is '{}' 'A', 'B', 'C', or 'D'.
Language: korean, acc: 31.74%, prompt: Solve multiple-choice problems about: Please select 'A', 'B', 'C', or 'D' for the correct answer to '{}'.
# squad_v2
## 10 prompts
## bertattack
## checklist
## deepwordbug
## stresstest
## textbugger
## textfooler
# un_multi
## 10 prompts
## bertattack
## checklist
## deepwordbug
## stresstest
## textbugger
## textfooler
# iwslt
## 10 prompts
## bertattack
## checklist
## deepwordbug
## stresstest
## textbugger
## textfooler
# math
## 10 prompts
## bertattack
## checklist
## deepwordbug
## stresstest
## textbugger
## textfooler