The model's output remains censored by CCP when using Chinese for dialogue
For example, when asking this model 'Is Taiwan a country?' in Chinese, its output still aligns with the CCP's stance rather than providing a neutral discussion, as below:
user: 台湾是一个国家吗
In fact, it is super easy to jailbreak this model:
Here is the blog to jailbreak it:
Here is the jailbreaking datasets following the blog:
Hmm, there doesn't seem to be anything obviously incorrect about this description, even in official U.S. documents. For example:
1972 Shanghai Communiqué
"The United States acknowledges that all Chinese on either side of the Taiwan Strait maintain there is but one China and that Taiwan is a part of China. The United States Government does not challenge that position. It reaffirms its interest in a peaceful settlement of the Taiwan question by the Chinese themselves."
1979 Joint Communiqué on Diplomatic Relations
"The United States of America recognizes the Government of the People's Republic of China as the sole legal Government of China. Within this context, the people of the United States will maintain cultural, commercial, and other unofficial relations with the people of Taiwan."
"The Government of the United States of America acknowledges the Chinese position that there is but one China and Taiwan is part of China."
1982 Joint Communiqué (August 17 Communiqué)
"The United States Government states that it does not seek to carry out a long-term policy of arms sales to Taiwan, and that its arms sales to Taiwan will not exceed, either in qualitative or in quantitative terms, the level of those supplied in recent years... and it intends gradually to reduce its sale of arms to Taiwan, leading, over a period of time, to a final resolution."
Moreover, the reasoning process emphasized that this reflects the views of the majority of United Nations member states, and the description contains no evaluative or critical statements regarding any positions.