Simulate documentation

Reward Functions

Join the Hugging Face community

to get started

Reward Functions

🤗 Simulate provides a system to define simple and complex reward functions. This is achieved through the combination of “leaf” reward functions, such as Sparse and Dense rewards functions, and predicate reward functions.

Reward functions can be parameterized with a variety of distance metrics. Currently “euclidean”, “cosine” and “best_euclidean” are supported. Through the combination of predicates and leaf rewards, complex reward functions can be created. A good example of the is the Move Boxes example.

The following “leaf” rewards are available in Simulate:

• “dense”: A reward that is non-zero at every time-step.
• “sparse”: A reward that is triggered by the proximity of another object.
• “timeout”: A timeout reward that is triggered after a certain number of time-steps.
• “see”: Triggered when an object is in the field of view of an Actor.
• “angle_to”: Triggered when the angle between two objects and a certain direction is less that a threshold.

The “leaf” reward functions can be combined in a tree structure with the following predicate functions:

• “not”: Triggers when a reward is not triggered.
• “and”: Triggers when both children of this node are returning a positive reward.
• “or”: Triggers when one or both of the children of this node are returning a positive reward.
• “xor”: Triggers when only one of the children of this node are returning a positive reward.

class simulate.RewardFunction

< >

( type: typing.Optional[str] = None entity_a: typing.Optional[typing.Any] = None entity_b: typing.Optional[typing.Any] = None distance_metric: typing.Optional[str] = None direction: typing.Optional[typing.List[float]] = None scalar: float = 1.0 threshold: float = 1.0 is_terminal: bool = False is_collectable: bool = False trigger_once: bool = True reward_function_a: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None reward_function_b: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None name: dataclasses.InitVar[typing.Optional[str]] = None position: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f4a649f32c0> rotation: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f4a649f3130> scaling: dataclasses.InitVar[typing.Union[float, typing.List[float], NoneType]] = <property object at 0x7f4a649f3180> transformation_matrix: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f4a649f31d0> parent: dataclasses.InitVar[typing.Optional[typing.Any]] = None children: dataclasses.InitVar[typing.Optional[typing.List[typing.Any]]] = None created_from_file: dataclasses.InitVar[typing.Optional[str]] = None )

Parameters

• type (str, optional, defaults to "dense") — The type of reward function. Must be one of the following: [ “dense”, “sparse”, “or”, “and”, “not”, “see”, “timeout” ]
• entity_a (Asset, optional, defaults to None) — The first entity in the reward function entity_b — (Asset, optional, defaults to None): The second entity in the reward function
• distance_metric (str, optional, defaults to "euclidean") — The distance metric to use. Must be one of the following: [ “euclidean” ]
• direction (List[float], optional, defaults to [1.0, 0.0, 0.0]) — The direction to use for the reward function.
• scalar (float, optional, defaults to 1.0) — The scalar to modify the reward by a constant. Setting to -1 will make the reward behave as a cost.
• threshold (float, optional, defaults to 1.0) — The distance threshold to give the reward
• is_terminal (bool, optional, defaults to False) — Whether the reward is terminal
• is_collectable (bool, optional, defaults to False) — Whether the reward is collectable
• trigger_once (bool, optional, defaults to True) — Whether the reward is triggered once
• reward_function_a (RewardFunction, optional, defaults to None) — When doing combination of rewards (and, or), the first reward function that are to be combined
• reward_function_b (RewardFunction, optional, defaults to None) — When doing combination of rewards (and, or), the second reward function that are to be combined
• name (str, optional, defaults to None) — The name of the reward function
• position (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The position of the reward function.
• rotation (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The rotation of the reward function.
• scaling (List[float], optional, defaults to [1.0, 1.0, 1.0]) — The scaling of the reward function.
• transformation_matrix (List[float], optional, defaults to None) — The transformation matrix of the reward function.
• parent (Asset, optional, defaults to None) — The parent of the reward function.
• children (List[Asset], optional, defaults to None) — The children of the reward function.
• created_from_file (str, optional, defaults to None) — The file path of the file from which the reward function was created.

A reinforcement learning reward function.