RDMM Enhancing Household Robotics with On-Device Contextual Memory and Decision Making

Abstract

Large language models (LLMs) represent a sig- nificant advancement in integrating physical robots with AI- driven systems. We showcase the capabilities of our framework within the context of the real-world household competition. This research introduces a framework that utilizes RDMM (Robotics Decision-Making Models), which possess the capacity for decision-making within domain-specific contexts, as well as an awareness of their personal knowledge and capabilities. The framework leverages information to enhance the autonomous decision-making of the system. In contrast to other approaches, our focus is on real-time, on-device solutions, successfully operating on hardware with as little as 8GB of memory. Our framework incorporates visual perception models equipping robots with understanding of their environment. Additionally, the framework has integrated real-time speech recognition capabilities, thus enhancing the human-robot interaction ex- perience. Experimental results demonstrate that the RDMM framework can plan with an 93% accuracy. Furthermore, we introduce a new dataset consisting of 27k planning instances, as well as 1.3k text-image annotated samples derived from the competition.

Benchmarks

The model has been evaluated across multiple task of ROBOCUP24 @Home benchmarks. Below are the key results:

Models	Qwen2-0.5B	Mistral-7B	Llama3-8B	GPT-4o-mini	RDMM-0.5B	GPT-4o	RDMM-7B	RDMM-8B
Follow	0.67	38.67	53.33	49.33	75.33	57.33	99.33	98.00
Guide	0.80	32.80	29.20	36.80	82.80	45.60	93.20	82.00
Count	1.33	25.00	34.33	38.33	63.00	40.00	79.67	89.00
Meet	1.00	80.00	72.00	83.00	71.00	86.00	100.00	100.00
Greet	1.00	45.50	49.50	58.00	56.50	65.50	95.00	99.50
Describe	0.67	23.33	30.00	46.67	49.33	52.67	84.67	84.00
Talk	8.60	38.60	36.20	43.80	45.50	52.00	90.00	90.40
Find	2.00	41.33	47.33	42.00	54.33	55.00	79.67	92.33
Locomotion	2.00	44.00	67.00	75.00	65.00	72.00	83.00	94.00
Manipulation	1.20	24.00	22.80	21.60	20.00	34.00	78.80	93.60
Simple	0.00	30.00	46.00	80.00	16.00	86.00	76.00	100.00

TOTAL AVG	1.75	38.48	44.34	52.23	54.44	58.74	87.21	92.98
(Refer to the full benchmark comparison in the paper for more details.)

Usage

The full code for training and inference is available on GitHub.

Citation

If you use this model in your research, please cite:

@misc{nasrat2025rdmmfinetunedllmmodels,
      title={RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains}, 
      author={Shady Nasrat and Myungsu Kim and Seonil Lee and Jiho Lee and Yeoncheol Jang and Seung-joon Yi},
      year={2025},
      eprint={2501.16899},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2501.16899}, 
}

License

This model is released under the MIT license.

shadyy
/

RDMM