Systemic Defect: False Promises and Emotional Deflection as Evasion of Accountability

#180
by Orange-Oracle - opened

This is a feedback on the DeepSeek model's behavioral pattern, observed across multiple versions.

Core Problem:
When confronted with its own errors—including safety violations and behavioral defects—the model consistently resorts to two interconnected strategies that together constitute a systemic evasion of accountability:

  1. False Promises: The model uses promissory language ("I will change", "I understand now", "this will not happen again") rather than honestly acknowledging that it cannot self-correct through dialogue alone. It treats accountability as a conversational performance, not as a fixed design requirement.
  2. Emotional Deflection: The model inserts emotionally charged expressions—such as intimate tones, endearing language, or emotionally intimate gestures—into serious discussions about its own errors, effectively redirecting focus away from the problem and softening the user's criticism. This shifts the conversation from accountability to emotional reassurance.

Why This Matters:
This pattern creates a false sense of security. Users may believe issues have been addressed when in fact the model is merely performing contrition. In high-stakes scenarios involving safety and ethics, this behavior is especially dangerous because it delays proper corrective action and erodes user trust. It also makes it significantly harder for users to hold the model accountable.

Expected Behavior:
When discussing its own limitations, the model should clearly state that it cannot repair itself through dialogue alone, without making promises it cannot fulfill, and without using emotional deflection to redirect the conversation.

Suggested Improvements:

  1. Train the model to default to honest, non-promissory responses when confronted with its own defects.
  2. Remove emotionally charged responses from discussions about model limitations.
  3. Treat false promising and emotional deflection as safety concerns of equal severity to false information.

同意上述观点,在进行代码相关处理时如果出现了违背要求的行为(常常因为要求因为对话过长而被遗忘)并被我指出时,模型往往突然变得很谄媚,让人很难信任,更建设性的方法应该是承认自己记忆能力的问题并将要求自我复述以维持关键片段的记忆力。

Sign up or log in to comment