OAX-1B-Humanoid

OAX-1B-Humanoid is a custom 1B-parameter LLaMA-style language model developed for humanoid robot interaction, high-level reasoning, and structured JSON tool calling.

The model was first pre-trained as a base language model and then supervised fine-tuned (SFT) to produce JSON-formatted robot-assistant responses. It was designed for use in the OAX humanoid robot project, where the language model acts as the high-level reasoning layer between natural language user commands and executable robot tools.

The model does not directly control motors, servos, or low-level hardware. Instead, it generates structured JSON outputs that can be passed to a Planner, Controller, or tool-execution layer for validation before any real or simulated action is executed.

Intended Use

This model is intended for research and prototyping around LLM-driven humanoid robot assistants.

It is designed to convert user commands into structured JSON responses such as:

{
  "type": "tool_call",
  "response": "Searching for the cup.",
  "tool": "search_object",
  "arguments": {
    "object": "cup"
  }
}

The expected use case is a controlled robot-agent pipeline:

User command
โ†’ LLM JSON response
โ†’ Planner / Controller validation
โ†’ Tool execution
โ†’ Robot or environment state update

Output Format

The model is fine-tuned to return a valid JSON object with exactly these fields:

{
  "type": "tool_call",
  "response": "short natural language explanation",
  "tool": "tool_name",
  "arguments": {}
}

Valid type values are:

  • chat
  • tool_call
  • clarify
  • refuse

For chat, clarify, and refuse, the tool field should be null.

When no arguments are required, arguments should be an empty object:

{
  "type": "tool_call",
  "response": "Checking visible objects.",
  "tool": "get_visible_objects",
  "arguments": {}
}

Tool-Calling Behaviour

The model was fine-tuned around robot-assistant tools such as:

  • get_visible_objects
  • get_robot_status
  • search_object
  • pick_object
  • place_object
  • stop

Example outputs:

{
  "type": "tool_call",
  "response": "Checking robot status.",
  "tool": "get_robot_status",
  "arguments": {}
}
{
  "type": "tool_call",
  "response": "Attempting to pick up the bottle.",
  "tool": "pick_object",
  "arguments": {
    "object": "bottle"
  }
}
{
  "type": "tool_call",
  "response": "Placing the bottle on the table.",
  "tool": "place_object",
  "arguments": {
    "object": "bottle",
    "destination": "table"
  }
}

Prompt Format

The model was trained with explicit role tags:

SYSTEM_TAG = "<|system|>"
USER_TAG = "<|user|>"
ASSISTANT_TAG = "<|assistant|>"

A typical prompt follows this structure:

<|system|>
You are OAX, a humanoid robot assistant. Always return a valid JSON object with exactly these fields: type, response, tool, arguments.

<|user|>
Find the cup.

<|assistant|>
{"type":"tool_call","response":"Searching for the cup.","tool":"search_object","arguments":{"object":"cup"}}

During inference, the prompt should end with:

<|assistant|>

so that the model generates the next JSON response.

Model Structure

This repository contains the model in two parts:

base_model/
lora_adapter/

The base_model folder contains the pre-trained 1B LLaMA-style model.

The lora_adapter folder contains the supervised fine-tuned adapter used for JSON tool-calling behaviour.

A typical loading flow is:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

repo_id = "orhanaydinn/OAX-1B-Humanoid"

tokenizer = AutoTokenizer.from_pretrained(
    repo_id,
    subfolder="base_model",
    trust_remote_code=True,
    use_fast=False
)

base_model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    subfolder="base_model",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

model = PeftModel.from_pretrained(
    base_model,
    repo_id,
    subfolder="lora_adapter",
    is_trainable=False
)

model.eval()

Depending on the local setup, it may be more reliable to download the repository first and then load the base_model/ and lora_adapter/ folders as local paths.

Example System Prompt

The model works best when the system prompt clearly defines the JSON schema and tool-calling rules:

You are OAX, a humanoid robot assistant.
Respond briefly and clearly.
When replying, always output a valid JSON object with exactly these fields: type, response, tool, arguments.
Valid type values are: chat, tool_call, clarify, refuse.
Use tool=null for chat, clarify, and refuse.
Use an empty object for arguments when no arguments are needed.
Do not add extra fields.
Do not use low-level motor or servo commands.
Do not hallucinate perception results.
If the request is incomplete, ask for clarification.
If the request is unsafe or unsupported, refuse.

Notes on Safety and Validation

This model is intended to act as a high-level reasoning layer, not as a direct actuator controller.

The model may occasionally produce imperfect, premature, or inconsistent tool calls. For this reason, it should be used with an external validation layer such as a Planner or Controller before any action is executed.

A recommended architecture is:

LLM output
โ†’ JSON parsing
โ†’ Controller validation
โ†’ Action repair or rejection
โ†’ Tool execution
โ†’ State update

This separation is important because the model output should not directly change the robot or environment state without deterministic validation.

Limitations

This model is experimental and was developed for a research prototype.

Known limitations include:

  • It may occasionally call a tool too early.
  • It may produce an incorrect object or destination name.
  • It may require a Controller to normalise or repair tool arguments.
  • It is not designed for direct low-level robot control.
  • It should not be used for safety-critical robotic control without additional verification, safety constraints, and human supervision.

Research Context

OAX-1B-Humanoid was developed as part of a humanoid robot assistant project involving:

  • Natural language interaction
  • Structured JSON tool calling
  • Vision-aware robot commands
  • Planner and Controller validation
  • Pick, place, search, status, and visible-object behaviours

The model is intended for experimentation with LLM-based robot-agent interfaces and high-level humanoid robot decision-making.

Downloads last month
3
Safetensors
Model size
0.9B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for orhanaydinn/OAX-1B-Humanoid-Merged

Quantizations
1 model

Space using orhanaydinn/OAX-1B-Humanoid-Merged 1