arxiv:2502.11349

Biases in Edge Language Models: Detection, Analysis, and Mitigation

Published on Feb 17

Authors:

Vinamra Sharma ,

Abstract

The integration of large language models (LLMs) on low-power edge devices such as Raspberry Pi, known as edge language models (ELMs), has introduced opportunities for more personalized, secure, and low-latency language intelligence that is accessible to all. However, the resource constraints inherent in edge devices and the lack of robust ethical safeguards in language models raise significant concerns about fairness, accountability, and transparency in model output generation. This paper conducts a comparative analysis of text-based bias across language model deployments on edge, cloud, and desktop environments, aiming to evaluate how deployment settings influence model fairness. Specifically, we examined an optimized Llama-2 model running on a Raspberry Pi 4; GPT 4o-mini, Gemini-1.5-flash, and Grok-beta models running on cloud servers; and Gemma2 and Mistral models running on a MacOS desktop machine. Our results demonstrate that Llama-2 running on Raspberry Pi 4 is 43.23% and 21.89% more prone to showing bias over time compared to models running on the desktop and cloud-based environments. We also propose the implementation of a feedback loop, a mechanism that iteratively adjusts model behavior based on previous outputs, where predefined constraint weights are applied layer-by-layer during inference, allowing the model to correct bias patterns, resulting in 79.28% reduction in model bias.

View arXiv page View PDF Add to collection

Community

Vnmrsharma

Paper author 6 days ago

Have you encountered bias issues when deploying LLMs ?
If yes, have you thought what happens when we further quantize a model for edge deployment and how can we mitigate that?

Our research investigates the biases that emerge when deploying large language models (LLMs) on edge devices. We conducted a comparative analysis across various deployment environments including edge, cloud and desktop. Findings reveal that the Llama-2 model on the Raspberry Pi 4 exhibits 43.23% and 21.89% more bias over time compared to its desktop and cloud counterparts, respectively. To address this, we propose a feedback loop mechanism that applies predefined constraint weights during inference, resulting in a 79.28% reduction in model bias.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.11349 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.11349 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.11349 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.